Files
curo1305 6834a6797f docs(05-03): complete GoogleDriveBackend + OneDriveBackend plan
- SUMMARY.md created for Plan 05-03
- STATE.md updated: completed_plans 26→27, progress 81→84%
- Session continuity updated with pytest results (262 passed / 43 xfailed / 1 pre-existing)
- Key decisions added: shared CloudConnectionError, cache_discovery=False, createUploadSession
2026-05-28 21:13:53 +02:00

9.3 KiB

phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, key-decisions, patterns-established, requirements-completed, duration, completed
phase plan subsystem tags requires provides affects tech-stack key-files key-decisions patterns-established requirements-completed duration completed
05-cloud-storage-backends 03 api
google-drive
onedrive
microsoft-graph
msal
google-api-python-client
oauth2
asyncio
cloud-storage
phase plan provides
05-cloud-storage-backends 02 CloudConnectionError (shared exception), StorageBackend ABC, asyncio.to_thread pattern reference (MinIOBackend)
backend/storage/google_drive_backend.py: GoogleDriveBackend + CloudConnectionError exception class
backend/storage/onedrive_backend.py: OneDriveBackend with resumable upload and MSAL token refresh
backend/tests/test_cloud_backends.py: 32 green TDD tests for both backends
05-05
05-06
05-07
05-08
added patterns
google-api-python-client 2.196.0 (Google Drive v3 API — files.create, get_media, delete, list)
google-auth-oauthlib 1.3.1 (google.oauth2.credentials.Credentials)
msal 1.36.0 (ConfidentialClientApplication.acquire_token_by_refresh_token)
Shared exception class: CloudConnectionError(reason=) defined once in google_drive_backend.py, imported by onedrive_backend.py
All sync SDK calls wrapped in asyncio.to_thread() — identical pattern to MinIOBackend
cache_discovery=False on googleapiclient.discovery.build() — prevents /tmp discovery doc writes
B2 design: backends are stateless signal-raisers — raise CloudConnectionError, never update DB
OneDrive resumable upload: createUploadSession for ALL files (no 4 MB size gate)
CHUNK_SIZE = 10 MB — above Graph's 4 MB simple upload limit (Pitfall 6 prevention)
created modified
backend/storage/google_drive_backend.py
backend/storage/onedrive_backend.py
backend/tests/test_cloud_backends.py
CloudConnectionError defined in google_drive_backend.py and imported by onedrive_backend.py — single shared exception type keeps error handling uniform in the API layer (cloud.py, Plan 05-05)
cache_discovery=False on Drive build() — prevents googleapiclient from writing /tmp discovery cache, avoiding /tmp traversal vector (T-05-03-05)
Resumable upload sessions used for ALL OneDrive uploads regardless of file size — simpler than a size gate and eliminates the 4 MB limit (Pitfall 6, RESEARCH.md Open Question 3)
MSAL invalid_grant detection via result.get('error') == 'invalid_grant' — confirmed as the correct Assumption A3 from RESEARCH.md
_ensure_valid_token() uses 60-second buffer before expiry — reduces race conditions between expiry check and actual API call
Backend statelessness: cloud backends raise CloudConnectionError(reason=) and never call session.commit()
Google Drive 401 → token_expired; 400 + invalid_grant body → invalid_grant
OneDrive: _ensure_valid_token() + _refresh_token() called before every operation
CLOUD-01
CLOUD-05
CLOUD-07
6min 2026-05-28

Phase 5 Plan 03: Google Drive and OneDrive StorageBackend Implementations Summary

Stateless GoogleDriveBackend (Drive v3 with asyncio.to_thread, cache_discovery=False) and OneDriveBackend (MSAL token refresh, 10 MB resumable upload sessions via createUploadSession) implementing all 7 StorageBackend methods

Performance

  • Duration: 6 min
  • Started: 2026-05-28T19:05:18Z
  • Completed: 2026-05-28T19:11:00Z
  • Tasks: 2
  • Files modified: 3

Accomplishments

  • Created google_drive_backend.py with CloudConnectionError(reason=) exception class and GoogleDriveBackend implementing all 7 StorageBackend methods. Every sync googleapiclient call is wrapped in asyncio.to_thread(). cache_discovery=False prevents /tmp traversal (T-05-03-05). HttpError 401 raises CloudConnectionError(reason="token_expired"); HttpError 400 with "invalid_grant" body raises CloudConnectionError(reason="invalid_grant"). presigned_get_url and generate_presigned_put_url raise NotImplementedError (D-14).
  • Created onedrive_backend.py with OneDriveBackend importing the shared CloudConnectionError from google_drive_backend. CHUNK_SIZE = 10 * 1024 * 1024 (10 MB). Uses Microsoft Graph createUploadSession for all uploads (no 4 MB size gate). _ensure_valid_token() checks expiry with 60s buffer; _refresh_token() wraps MSAL in asyncio.to_thread() and returns None on invalid_grant to trigger CloudConnectionError(reason="invalid_grant"). Both presigned_* methods raise NotImplementedError.
  • Created tests/test_cloud_backends.py with 32 TDD tests (RED → GREEN) covering imports, all 7 methods being async, CHUNK_SIZE, shared CloudConnectionError, presigned_* raising NotImplementedError, _init__ correctness, and _ensure_valid_token behavior for expired/non-expired tokens.

Task Commits

Each task was committed atomically following the TDD RED → GREEN cycle:

  1. RED phase tests — both backends - 4efe7c1 (test)
  2. Task 1: GoogleDriveBackend - 337ee8e (feat)
  3. Task 2: OneDriveBackend - bcb887e (feat)

Files Created/Modified

  • /Users/nik/Documents/Progamming/document_scanner/backend/storage/google_drive_backend.py — GoogleDriveBackend (all 7 methods) + CloudConnectionError exception class
  • /Users/nik/Documents/Progamming/document_scanner/backend/storage/onedrive_backend.py — OneDriveBackend (all 7 methods), CHUNK_SIZE, MSAL token refresh, resumable upload
  • /Users/nik/Documents/Progamming/document_scanner/backend/tests/test_cloud_backends.py — 32 green TDD tests for both backends

Decisions Made

  • CloudConnectionError is defined once in google_drive_backend.py and imported by onedrive_backend.py. This keeps the exception type unified — the API layer in cloud.py (Plan 05-05) will catch one exception type regardless of which backend raised it.
  • cache_discovery=False is explicitly set on googleapiclient.discovery.build(). Without this flag, the client writes a JSON discovery document to /tmp on first call — this was identified as Threat T-05-03-05 in the plan's threat model.
  • createUploadSession is used for ALL OneDrive uploads (not only files > 4 MB). This matches RESEARCH.md's resolution of Open Question 3: simpler code (no size branch), avoids the 4 MB limit entirely, and handles both small and large files through the same path.
  • MSAL's invalid_grant is detected via result.get("error") == "invalid_grant" — consistent with Assumption A3 in RESEARCH.md. The MSAL library returns a dict (never raises), so field-level checking is the correct approach.

Deviations from Plan

None — plan executed exactly as written. Both backends implemented per the action specifications, all acceptance criteria met.

Issues Encountered

google-api-python-client, google-auth-oauthlib, and msal were not installed in the local Python 3.9.6 environment (they were added to requirements.txt in Plan 05-01 but not installed locally). Installed all three via pip3 install to enable local test execution. This is consistent with the Plan 05-02 SUMMARY's note about running tests locally vs. Docker.

FutureWarnings from google.auth about Python 3.9 end-of-life appeared in pytest output but do not affect test results — they are informational warnings from the library, not from our code.

Known Stubs

None. Both backends are fully implemented with real method bodies. No placeholder returns or TODO comments in production code paths.

Threat Surface Scan

No new network endpoints introduced. Both backends are pure library classes:

  • GoogleDriveBackend makes outbound calls to googleapis.com using OAuth tokens from the decrypted credentials dict. Credentials are not logged.
  • OneDriveBackend makes outbound calls to graph.microsoft.com and login.microsoftonline.com (via MSAL). Credentials are not logged.

No new trust boundaries not already documented in the plan's <threat_model>. All STRIDE mitigations listed are implemented:

  • T-05-03-01: Credentials dict never logged; only in memory during request lifecycle
  • T-05-03-02: invalid_grant detection implemented; CloudConnectionError(reason="invalid_grant") propagated to API layer
  • T-05-03-05: cache_discovery=False implemented on Drive build() call

No threat flags raised.

Next Phase Readiness

  • Both OAuth cloud backends are complete and importable. Plan 05-05 (cloud.py API layer) can import GoogleDriveBackend, OneDriveBackend, and CloudConnectionError directly.
  • The get_storage_backend_for_document() factory in storage/__init__.py (Plan 05-02) already has lazy imports for both backends; the # type: ignore[import] comments can be resolved once Plan 05-05 adds the actual cloud router.
  • 32 new tests in test_cloud_backends.py are all green.
  • Full suite: 262 passed / 43 xfailed / 1 pre-existing failure (test_extract_docx — python-docx not installed locally).

Self-Check: PASSED

Files verified present:

  • backend/storage/google_drive_backend.py: FOUND
  • backend/storage/onedrive_backend.py: FOUND
  • backend/tests/test_cloud_backends.py: FOUND

Commits verified:

  • 4efe7c1: test(05-03): add RED phase tests — FOUND
  • 337ee8e: feat(05-03): implement GoogleDriveBackend — FOUND
  • bcb887e: feat(05-03): implement OneDriveBackend — FOUND

Phase: 05-cloud-storage-backends Completed: 2026-05-28