From 6834a6797f6b790bfc89aa14ec4c6328d1429b52 Mon Sep 17 00:00:00 2001 From: curo1305 Date: Thu, 28 May 2026 21:13:53 +0200 Subject: [PATCH] docs(05-03): complete GoogleDriveBackend + OneDriveBackend plan MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - SUMMARY.md created for Plan 05-03 - STATE.md updated: completed_plans 26→27, progress 81→84% - Session continuity updated with pytest results (262 passed / 43 xfailed / 1 pre-existing) - Key decisions added: shared CloudConnectionError, cache_discovery=False, createUploadSession --- .planning/STATE.md | 23 +-- .../05-03-SUMMARY.md | 148 ++++++++++++++++++ 2 files changed, 162 insertions(+), 9 deletions(-) create mode 100644 .planning/phases/05-cloud-storage-backends/05-03-SUMMARY.md diff --git a/.planning/STATE.md b/.planning/STATE.md index 655c9ca..b56e291 100644 --- a/.planning/STATE.md +++ b/.planning/STATE.md @@ -4,13 +4,13 @@ milestone: v1.0 milestone_name: milestone current_phase: 5 status: executing -last_updated: "2026-05-28T19:30:00.000Z" +last_updated: "2026-05-28T19:15:00.000Z" progress: total_phases: 5 completed_phases: 4 total_plans: 32 - completed_plans: 26 - percent: 81 + completed_plans: 27 + percent: 84 --- # Project State @@ -28,13 +28,13 @@ progress: | 2 | Users & Authentication | ✓ Complete (5/5 plans) | | 3 | Document Migration & Multi-User Isolation | ✓ Complete (5/5 plans, UAT passed, security gate passed) | | 4 | Folders, Sharing, Quotas & Document UX | ✓ Complete (9/9 plans, UAT 14/15 passed, 1 bug fixed) | -| 5 | Cloud Storage Backends | Not Started | +| 5 | Cloud Storage Backends | In Progress (3/8 plans complete) | ## Current Position -**Phase:** 05-cloud-storage-backends — Not started -**Plan:** 0/TBD -**Progress:** [████████░░] 78% +**Phase:** 05-cloud-storage-backends — In Progress +**Plan:** 3/8 +**Progress:** [████████░░] 84% ## Performance Metrics @@ -124,6 +124,10 @@ progress: | CloudConnectionOut whitelist pattern | Pydantic model with exactly the safe fields; credentials_enc absent by omission — SEC-08 safe-by-default | | admin.user_deleted flush before delete | audit write flushed (session.flush()) while user FK still valid; session.delete(user) follows — preserves audit FK integrity | | test_admin_impersonation 405 acceptable | DELETE /users/{id} causes GET to return 405 not 422; both mean no GET impersonation endpoint; test updated to accept {404, 405, 422} | +| CloudConnectionError shared exception type | Defined once in google_drive_backend.py; imported by onedrive_backend.py — single exception type across all cloud backends | +| cache_discovery=False on Drive build() | Prevents /tmp discovery cache writes — directory traversal vector (T-05-03-05) | +| createUploadSession for all OneDrive uploads | No 4 MB size gate; resumable sessions handle small and large files through same code path (Pitfall 6) | +| MSAL invalid_grant via result.get('error') | MSAL returns dict (never raises); field-level check is correct — Assumption A3 confirmed | ### Open Questions @@ -169,6 +173,7 @@ _Updated at each phase transition._ | Last session | 2026-05-28 — Phase 5 planned (8 plans, 7 waves); verification passed (4 blockers → resolved: D-05 API-layer refresh path, SEC-09 cloud cleanup, frontend_url config, RESEARCH resolved markers) | | Last session | 2026-05-28 — Plan 05-01 executed: Wave 0 Nyquist scaffold — 19 xfail stubs in test_cloud.py, 4 cloud fixtures in conftest.py, 6 package pins, 8 config settings; 172 passed / 43 xfailed | | Last session | 2026-05-28 — Plan 05-02 executed: cloud_utils.py (SSRF+HKDF), cloud_cache.py (TTLCache), storage factory extended; 199 passed / 43 xfailed / 1 pre-existing failure | -| Next action | Execute Plan 05-03: GoogleDriveBackend + OneDriveBackend (all 7 StorageBackend methods) | +| Last session | 2026-05-28 — Plan 05-03 executed: GoogleDriveBackend (Drive v3, cache_discovery=False, asyncio.to_thread) + OneDriveBackend (MSAL, resumable upload, CHUNK_SIZE=10MB); 262 passed / 43 xfailed / 1 pre-existing failure | +| Next action | Execute Plan 05-04: WebDAVBackend + NextcloudBackend | | Pending decisions | None | -| Resume file | `.planning/phases/05-cloud-storage-backends/05-03-PLAN.md` | +| Resume file | `.planning/phases/05-cloud-storage-backends/05-04-PLAN.md` | diff --git a/.planning/phases/05-cloud-storage-backends/05-03-SUMMARY.md b/.planning/phases/05-cloud-storage-backends/05-03-SUMMARY.md new file mode 100644 index 0000000..dc6ff18 --- /dev/null +++ b/.planning/phases/05-cloud-storage-backends/05-03-SUMMARY.md @@ -0,0 +1,148 @@ +--- +phase: 05-cloud-storage-backends +plan: 03 +subsystem: api +tags: [google-drive, onedrive, microsoft-graph, msal, google-api-python-client, oauth2, asyncio, cloud-storage] + +# Dependency graph +requires: + - phase: 05-cloud-storage-backends + plan: 02 + provides: "CloudConnectionError (shared exception), StorageBackend ABC, asyncio.to_thread pattern reference (MinIOBackend)" +provides: + - "backend/storage/google_drive_backend.py: GoogleDriveBackend + CloudConnectionError exception class" + - "backend/storage/onedrive_backend.py: OneDriveBackend with resumable upload and MSAL token refresh" + - "backend/tests/test_cloud_backends.py: 32 green TDD tests for both backends" +affects: [05-05, 05-06, 05-07, 05-08] + +# Tech tracking +tech-stack: + added: + - google-api-python-client 2.196.0 (Google Drive v3 API — files.create, get_media, delete, list) + - google-auth-oauthlib 1.3.1 (google.oauth2.credentials.Credentials) + - msal 1.36.0 (ConfidentialClientApplication.acquire_token_by_refresh_token) + patterns: + - "Shared exception class: CloudConnectionError(reason=) defined once in google_drive_backend.py, imported by onedrive_backend.py" + - "All sync SDK calls wrapped in asyncio.to_thread() — identical pattern to MinIOBackend" + - "cache_discovery=False on googleapiclient.discovery.build() — prevents /tmp discovery doc writes" + - "B2 design: backends are stateless signal-raisers — raise CloudConnectionError, never update DB" + - "OneDrive resumable upload: createUploadSession for ALL files (no 4 MB size gate)" + - "CHUNK_SIZE = 10 MB — above Graph's 4 MB simple upload limit (Pitfall 6 prevention)" + +key-files: + created: + - backend/storage/google_drive_backend.py + - backend/storage/onedrive_backend.py + - backend/tests/test_cloud_backends.py + modified: [] + +key-decisions: + - "CloudConnectionError defined in google_drive_backend.py and imported by onedrive_backend.py — single shared exception type keeps error handling uniform in the API layer (cloud.py, Plan 05-05)" + - "cache_discovery=False on Drive build() — prevents googleapiclient from writing /tmp discovery cache, avoiding /tmp traversal vector (T-05-03-05)" + - "Resumable upload sessions used for ALL OneDrive uploads regardless of file size — simpler than a size gate and eliminates the 4 MB limit (Pitfall 6, RESEARCH.md Open Question 3)" + - "MSAL invalid_grant detection via result.get('error') == 'invalid_grant' — confirmed as the correct Assumption A3 from RESEARCH.md" + - "_ensure_valid_token() uses 60-second buffer before expiry — reduces race conditions between expiry check and actual API call" + +patterns-established: + - "Backend statelessness: cloud backends raise CloudConnectionError(reason=) and never call session.commit()" + - "Google Drive 401 → token_expired; 400 + invalid_grant body → invalid_grant" + - "OneDrive: _ensure_valid_token() + _refresh_token() called before every operation" + +requirements-completed: + - CLOUD-01 + - CLOUD-05 + - CLOUD-07 + +# Metrics +duration: 6min +completed: 2026-05-28 +--- + +# Phase 5 Plan 03: Google Drive and OneDrive StorageBackend Implementations Summary + +**Stateless GoogleDriveBackend (Drive v3 with asyncio.to_thread, cache_discovery=False) and OneDriveBackend (MSAL token refresh, 10 MB resumable upload sessions via createUploadSession) implementing all 7 StorageBackend methods** + +## Performance + +- **Duration:** 6 min +- **Started:** 2026-05-28T19:05:18Z +- **Completed:** 2026-05-28T19:11:00Z +- **Tasks:** 2 +- **Files modified:** 3 + +## Accomplishments + +- Created `google_drive_backend.py` with `CloudConnectionError(reason=)` exception class and `GoogleDriveBackend` implementing all 7 StorageBackend methods. Every sync `googleapiclient` call is wrapped in `asyncio.to_thread()`. `cache_discovery=False` prevents /tmp traversal (T-05-03-05). HttpError 401 raises `CloudConnectionError(reason="token_expired")`; HttpError 400 with "invalid_grant" body raises `CloudConnectionError(reason="invalid_grant")`. `presigned_get_url` and `generate_presigned_put_url` raise `NotImplementedError` (D-14). +- Created `onedrive_backend.py` with `OneDriveBackend` importing the shared `CloudConnectionError` from `google_drive_backend`. `CHUNK_SIZE = 10 * 1024 * 1024` (10 MB). Uses Microsoft Graph `createUploadSession` for all uploads (no 4 MB size gate). `_ensure_valid_token()` checks expiry with 60s buffer; `_refresh_token()` wraps MSAL in `asyncio.to_thread()` and returns `None` on `invalid_grant` to trigger `CloudConnectionError(reason="invalid_grant")`. Both `presigned_*` methods raise `NotImplementedError`. +- Created `tests/test_cloud_backends.py` with 32 TDD tests (RED → GREEN) covering imports, all 7 methods being async, `CHUNK_SIZE`, shared `CloudConnectionError`, `presigned_*` raising `NotImplementedError`, `_init__` correctness, and `_ensure_valid_token` behavior for expired/non-expired tokens. + +## Task Commits + +Each task was committed atomically following the TDD RED → GREEN cycle: + +1. **RED phase tests — both backends** - `4efe7c1` (test) +2. **Task 1: GoogleDriveBackend** - `337ee8e` (feat) +3. **Task 2: OneDriveBackend** - `bcb887e` (feat) + +## Files Created/Modified + +- `/Users/nik/Documents/Progamming/document_scanner/backend/storage/google_drive_backend.py` — GoogleDriveBackend (all 7 methods) + CloudConnectionError exception class +- `/Users/nik/Documents/Progamming/document_scanner/backend/storage/onedrive_backend.py` — OneDriveBackend (all 7 methods), CHUNK_SIZE, MSAL token refresh, resumable upload +- `/Users/nik/Documents/Progamming/document_scanner/backend/tests/test_cloud_backends.py` — 32 green TDD tests for both backends + +## Decisions Made + +- `CloudConnectionError` is defined once in `google_drive_backend.py` and imported by `onedrive_backend.py`. This keeps the exception type unified — the API layer in `cloud.py` (Plan 05-05) will catch one exception type regardless of which backend raised it. +- `cache_discovery=False` is explicitly set on `googleapiclient.discovery.build()`. Without this flag, the client writes a JSON discovery document to `/tmp` on first call — this was identified as Threat T-05-03-05 in the plan's threat model. +- `createUploadSession` is used for ALL OneDrive uploads (not only files > 4 MB). This matches RESEARCH.md's resolution of Open Question 3: simpler code (no size branch), avoids the 4 MB limit entirely, and handles both small and large files through the same path. +- MSAL's `invalid_grant` is detected via `result.get("error") == "invalid_grant"` — consistent with Assumption A3 in RESEARCH.md. The MSAL library returns a dict (never raises), so field-level checking is the correct approach. + +## Deviations from Plan + +None — plan executed exactly as written. Both backends implemented per the action specifications, all acceptance criteria met. + +## Issues Encountered + +`google-api-python-client`, `google-auth-oauthlib`, and `msal` were not installed in the local Python 3.9.6 environment (they were added to `requirements.txt` in Plan 05-01 but not installed locally). Installed all three via `pip3 install` to enable local test execution. This is consistent with the Plan 05-02 SUMMARY's note about running tests locally vs. Docker. + +FutureWarnings from `google.auth` about Python 3.9 end-of-life appeared in pytest output but do not affect test results — they are informational warnings from the library, not from our code. + +## Known Stubs + +None. Both backends are fully implemented with real method bodies. No placeholder returns or TODO comments in production code paths. + +## Threat Surface Scan + +No new network endpoints introduced. Both backends are pure library classes: +- `GoogleDriveBackend` makes outbound calls to `googleapis.com` using OAuth tokens from the decrypted credentials dict. Credentials are not logged. +- `OneDriveBackend` makes outbound calls to `graph.microsoft.com` and `login.microsoftonline.com` (via MSAL). Credentials are not logged. + +No new trust boundaries not already documented in the plan's ``. All STRIDE mitigations listed are implemented: +- T-05-03-01: Credentials dict never logged; only in memory during request lifecycle +- T-05-03-02: `invalid_grant` detection implemented; `CloudConnectionError(reason="invalid_grant")` propagated to API layer +- T-05-03-05: `cache_discovery=False` implemented on Drive `build()` call + +No threat flags raised. + +## Next Phase Readiness + +- Both OAuth cloud backends are complete and importable. Plan 05-05 (`cloud.py` API layer) can import `GoogleDriveBackend`, `OneDriveBackend`, and `CloudConnectionError` directly. +- The `get_storage_backend_for_document()` factory in `storage/__init__.py` (Plan 05-02) already has lazy imports for both backends; the `# type: ignore[import]` comments can be resolved once Plan 05-05 adds the actual cloud router. +- 32 new tests in `test_cloud_backends.py` are all green. +- Full suite: 262 passed / 43 xfailed / 1 pre-existing failure (`test_extract_docx` — python-docx not installed locally). + +## Self-Check: PASSED + +Files verified present: +- `backend/storage/google_drive_backend.py`: FOUND +- `backend/storage/onedrive_backend.py`: FOUND +- `backend/tests/test_cloud_backends.py`: FOUND + +Commits verified: +- 4efe7c1: test(05-03): add RED phase tests — FOUND +- 337ee8e: feat(05-03): implement GoogleDriveBackend — FOUND +- bcb887e: feat(05-03): implement OneDriveBackend — FOUND + +--- +*Phase: 05-cloud-storage-backends* +*Completed: 2026-05-28*