--- phase: 05-cloud-storage-backends plan: 03 subsystem: api tags: [google-drive, onedrive, microsoft-graph, msal, google-api-python-client, oauth2, asyncio, cloud-storage] # Dependency graph requires: - phase: 05-cloud-storage-backends plan: 02 provides: "CloudConnectionError (shared exception), StorageBackend ABC, asyncio.to_thread pattern reference (MinIOBackend)" provides: - "backend/storage/google_drive_backend.py: GoogleDriveBackend + CloudConnectionError exception class" - "backend/storage/onedrive_backend.py: OneDriveBackend with resumable upload and MSAL token refresh" - "backend/tests/test_cloud_backends.py: 32 green TDD tests for both backends" affects: [05-05, 05-06, 05-07, 05-08] # Tech tracking tech-stack: added: - google-api-python-client 2.196.0 (Google Drive v3 API — files.create, get_media, delete, list) - google-auth-oauthlib 1.3.1 (google.oauth2.credentials.Credentials) - msal 1.36.0 (ConfidentialClientApplication.acquire_token_by_refresh_token) patterns: - "Shared exception class: CloudConnectionError(reason=) defined once in google_drive_backend.py, imported by onedrive_backend.py" - "All sync SDK calls wrapped in asyncio.to_thread() — identical pattern to MinIOBackend" - "cache_discovery=False on googleapiclient.discovery.build() — prevents /tmp discovery doc writes" - "B2 design: backends are stateless signal-raisers — raise CloudConnectionError, never update DB" - "OneDrive resumable upload: createUploadSession for ALL files (no 4 MB size gate)" - "CHUNK_SIZE = 10 MB — above Graph's 4 MB simple upload limit (Pitfall 6 prevention)" key-files: created: - backend/storage/google_drive_backend.py - backend/storage/onedrive_backend.py - backend/tests/test_cloud_backends.py modified: [] key-decisions: - "CloudConnectionError defined in google_drive_backend.py and imported by onedrive_backend.py — single shared exception type keeps error handling uniform in the API layer (cloud.py, Plan 05-05)" - "cache_discovery=False on Drive build() — prevents googleapiclient from writing /tmp discovery cache, avoiding /tmp traversal vector (T-05-03-05)" - "Resumable upload sessions used for ALL OneDrive uploads regardless of file size — simpler than a size gate and eliminates the 4 MB limit (Pitfall 6, RESEARCH.md Open Question 3)" - "MSAL invalid_grant detection via result.get('error') == 'invalid_grant' — confirmed as the correct Assumption A3 from RESEARCH.md" - "_ensure_valid_token() uses 60-second buffer before expiry — reduces race conditions between expiry check and actual API call" patterns-established: - "Backend statelessness: cloud backends raise CloudConnectionError(reason=) and never call session.commit()" - "Google Drive 401 → token_expired; 400 + invalid_grant body → invalid_grant" - "OneDrive: _ensure_valid_token() + _refresh_token() called before every operation" requirements-completed: - CLOUD-01 - CLOUD-05 - CLOUD-07 # Metrics duration: 6min completed: 2026-05-28 --- # Phase 5 Plan 03: Google Drive and OneDrive StorageBackend Implementations Summary **Stateless GoogleDriveBackend (Drive v3 with asyncio.to_thread, cache_discovery=False) and OneDriveBackend (MSAL token refresh, 10 MB resumable upload sessions via createUploadSession) implementing all 7 StorageBackend methods** ## Performance - **Duration:** 6 min - **Started:** 2026-05-28T19:05:18Z - **Completed:** 2026-05-28T19:11:00Z - **Tasks:** 2 - **Files modified:** 3 ## Accomplishments - Created `google_drive_backend.py` with `CloudConnectionError(reason=)` exception class and `GoogleDriveBackend` implementing all 7 StorageBackend methods. Every sync `googleapiclient` call is wrapped in `asyncio.to_thread()`. `cache_discovery=False` prevents /tmp traversal (T-05-03-05). HttpError 401 raises `CloudConnectionError(reason="token_expired")`; HttpError 400 with "invalid_grant" body raises `CloudConnectionError(reason="invalid_grant")`. `presigned_get_url` and `generate_presigned_put_url` raise `NotImplementedError` (D-14). - Created `onedrive_backend.py` with `OneDriveBackend` importing the shared `CloudConnectionError` from `google_drive_backend`. `CHUNK_SIZE = 10 * 1024 * 1024` (10 MB). Uses Microsoft Graph `createUploadSession` for all uploads (no 4 MB size gate). `_ensure_valid_token()` checks expiry with 60s buffer; `_refresh_token()` wraps MSAL in `asyncio.to_thread()` and returns `None` on `invalid_grant` to trigger `CloudConnectionError(reason="invalid_grant")`. Both `presigned_*` methods raise `NotImplementedError`. - Created `tests/test_cloud_backends.py` with 32 TDD tests (RED → GREEN) covering imports, all 7 methods being async, `CHUNK_SIZE`, shared `CloudConnectionError`, `presigned_*` raising `NotImplementedError`, `_init__` correctness, and `_ensure_valid_token` behavior for expired/non-expired tokens. ## Task Commits Each task was committed atomically following the TDD RED → GREEN cycle: 1. **RED phase tests — both backends** - `4efe7c1` (test) 2. **Task 1: GoogleDriveBackend** - `337ee8e` (feat) 3. **Task 2: OneDriveBackend** - `bcb887e` (feat) ## Files Created/Modified - `/Users/nik/Documents/Progamming/document_scanner/backend/storage/google_drive_backend.py` — GoogleDriveBackend (all 7 methods) + CloudConnectionError exception class - `/Users/nik/Documents/Progamming/document_scanner/backend/storage/onedrive_backend.py` — OneDriveBackend (all 7 methods), CHUNK_SIZE, MSAL token refresh, resumable upload - `/Users/nik/Documents/Progamming/document_scanner/backend/tests/test_cloud_backends.py` — 32 green TDD tests for both backends ## Decisions Made - `CloudConnectionError` is defined once in `google_drive_backend.py` and imported by `onedrive_backend.py`. This keeps the exception type unified — the API layer in `cloud.py` (Plan 05-05) will catch one exception type regardless of which backend raised it. - `cache_discovery=False` is explicitly set on `googleapiclient.discovery.build()`. Without this flag, the client writes a JSON discovery document to `/tmp` on first call — this was identified as Threat T-05-03-05 in the plan's threat model. - `createUploadSession` is used for ALL OneDrive uploads (not only files > 4 MB). This matches RESEARCH.md's resolution of Open Question 3: simpler code (no size branch), avoids the 4 MB limit entirely, and handles both small and large files through the same path. - MSAL's `invalid_grant` is detected via `result.get("error") == "invalid_grant"` — consistent with Assumption A3 in RESEARCH.md. The MSAL library returns a dict (never raises), so field-level checking is the correct approach. ## Deviations from Plan None — plan executed exactly as written. Both backends implemented per the action specifications, all acceptance criteria met. ## Issues Encountered `google-api-python-client`, `google-auth-oauthlib`, and `msal` were not installed in the local Python 3.9.6 environment (they were added to `requirements.txt` in Plan 05-01 but not installed locally). Installed all three via `pip3 install` to enable local test execution. This is consistent with the Plan 05-02 SUMMARY's note about running tests locally vs. Docker. FutureWarnings from `google.auth` about Python 3.9 end-of-life appeared in pytest output but do not affect test results — they are informational warnings from the library, not from our code. ## Known Stubs None. Both backends are fully implemented with real method bodies. No placeholder returns or TODO comments in production code paths. ## Threat Surface Scan No new network endpoints introduced. Both backends are pure library classes: - `GoogleDriveBackend` makes outbound calls to `googleapis.com` using OAuth tokens from the decrypted credentials dict. Credentials are not logged. - `OneDriveBackend` makes outbound calls to `graph.microsoft.com` and `login.microsoftonline.com` (via MSAL). Credentials are not logged. No new trust boundaries not already documented in the plan's ``. All STRIDE mitigations listed are implemented: - T-05-03-01: Credentials dict never logged; only in memory during request lifecycle - T-05-03-02: `invalid_grant` detection implemented; `CloudConnectionError(reason="invalid_grant")` propagated to API layer - T-05-03-05: `cache_discovery=False` implemented on Drive `build()` call No threat flags raised. ## Next Phase Readiness - Both OAuth cloud backends are complete and importable. Plan 05-05 (`cloud.py` API layer) can import `GoogleDriveBackend`, `OneDriveBackend`, and `CloudConnectionError` directly. - The `get_storage_backend_for_document()` factory in `storage/__init__.py` (Plan 05-02) already has lazy imports for both backends; the `# type: ignore[import]` comments can be resolved once Plan 05-05 adds the actual cloud router. - 32 new tests in `test_cloud_backends.py` are all green. - Full suite: 262 passed / 43 xfailed / 1 pre-existing failure (`test_extract_docx` — python-docx not installed locally). ## Self-Check: PASSED Files verified present: - `backend/storage/google_drive_backend.py`: FOUND - `backend/storage/onedrive_backend.py`: FOUND - `backend/tests/test_cloud_backends.py`: FOUND Commits verified: - 4efe7c1: test(05-03): add RED phase tests — FOUND - 337ee8e: feat(05-03): implement GoogleDriveBackend — FOUND - bcb887e: feat(05-03): implement OneDriveBackend — FOUND --- *Phase: 05-cloud-storage-backends* *Completed: 2026-05-28*