fix(05): revise Phase 5 plans based on checker feedback — B1-B4, W1-W4

B1: Mark RESEARCH.md Open Questions as (RESOLVED) with decision text for all 3
B2: Backends now stateless — raise CloudConnectionError(reason=) only; API layer
    in cloud.py owns token refresh + DB update via _call_cloud_op helper
B3: Add Task 3 to Plan 05 — cloud connection + object cleanup on account deletion (SEC-09)
B4: Add frontend_url setting to Plan 01 Task 1; Plan 05 uses settings.frontend_url
    for OAuth callback redirects
W1: ROADMAP.md Phase 5 now correctly labels Plans 03+04 as Wave 3 (not Wave 2)
W2: Plan 06 invalid_grant test now asserts both 503 HTTP response AND DB REQUIRES_REAUTH
W3: Plan 06 Task 2 split into unit tests (4, cloud_utils.py) and integration tests (11, HTTP)
W4: Plan 07 adds Vitest tests for cloudConnections store (4 tests) and SettingsCloudTab
    mount test (2 tests) per CLAUDE.md testing protocol

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
curo1305
2026-05-28 19:55:28 +02:00
parent baa5bed7e2
commit d13801538d
7 changed files with 328 additions and 59 deletions
@@ -20,7 +20,7 @@ must_haves:
- "OneDriveBackend implements all 7 StorageBackend abstract methods"
- "generate_presigned_put_url and presigned_get_url raise NotImplementedError on both cloud backends (D-14)"
- "All sync SDK calls wrapped in asyncio.to_thread() — event loop never blocked"
- "On-demand token refresh: 401/token-expiry error triggers transparent refresh; invalid_grant sets REQUIRES_REAUTH"
- "Backends are stateless: raise CloudConnectionError(reason="token_expired") on expiry or CloudConnectionError(reason="invalid_grant") on revocation — DB update belongs to API layer (D-05/D-06, B2 design)"
- "Google OAuth Flow uses access_type='offline', prompt='consent' (Pitfall 1 prevention)"
- "OneDrive uses resumable upload sessions (createUploadSession) for all files (Pitfall 6 prevention)"
artifacts:
@@ -44,7 +44,7 @@ must_haves:
<objective>
Implement GoogleDriveBackend and OneDriveBackend — the two OAuth-based cloud StorageBackend concrete classes.
Purpose: These backends handle Google Drive v3 and Microsoft Graph file operations. Both use async-wrapped sync SDKs, on-demand token refresh, and handle the invalid_grant → REQUIRES_REAUTH transition per D-05/D-06.
Purpose: These backends handle Google Drive v3 and Microsoft Graph file operations. Both use async-wrapped sync SDKs and raise CloudConnectionError(reason) for token expiry/revocation. The DB transition (REQUIRES_REAUTH) is handled by the API layer per B2 design — backends are stateless.
Output: google_drive_backend.py and onedrive_backend.py, each implementing all 7 StorageBackend methods.
</objective>
@@ -91,11 +91,13 @@ Microsoft Graph: GET /me/drive/items/{item_id}/content — streams bytes
Microsoft Graph: DELETE /me/drive/items/{item_id}
OneDrive object_key = item_id from upload response
<!-- From RESEARCH.md Pattern 10 — On-demand token refresh -->
Custom exception: CloudConnectionError (raised when invalid_grant detected)
On 401 / token-expiry: refresh token, update credentials_enc in conn, retry once
On invalid_grant: set conn.status = "REQUIRES_REAUTH", raise CloudConnectionError
Both backends need session + conn parameters for the refresh/update path (passed by the API layer caller)
<!-- From RESEARCH.md Pattern 10 — On-demand token refresh (B2 design: API layer owns DB updates) -->
Custom exception: CloudConnectionError raised with reason attribute:
- reason="token_expired": API layer will refresh the token, update DB, and retry
- reason="invalid_grant": API layer will set conn.status="REQUIRES_REAUTH" in DB and raise HTTPException(503)
Backends are STATELESS — they raise CloudConnectionError but do NOT update DB or conn directly.
DB updates happen in the _call_cloud_op() helper in cloud.py (Plan 05), which has the session.
This keeps backends testable without DB fixtures.
</interfaces>
<tasks>
@@ -119,8 +121,10 @@ Both backends need session + conn parameters for the refresh/update path (passed
- stat_object: calls service.files().get(fileId=key, fields="size") wrapped in asyncio.to_thread(); returns int(metadata.get("size", 0))
- health_check: tries files().list(pageSize=1) wrapped in asyncio.to_thread(); returns True/False
- All sync googleapiclient calls wrapped in asyncio.to_thread() (Pitfall 7)
- On-demand token refresh: _is_token_expired(e) detects googleapiclient.errors.HttpError status 401; _refresh_google_creds(credentials) calls google.auth.transport.requests.Request() to refresh; returns updated credentials dict or None on invalid_grant
- CloudConnectionError exception class defined in this module for invalid_grant signaling
- CloudConnectionError exception class defined in this module; raised with reason attribute (not raised directly by the DB operations)
- On HttpError 401 (token expired): raise CloudConnectionError(reason="token_expired") — the API layer in cloud.py handles the actual refresh and DB update per D-05 (B2 design)
- On invalid_grant detection (googleapiclient.errors.HttpError with specific message or custom check): raise CloudConnectionError(reason="invalid_grant") — the API layer sets REQUIRES_REAUTH per D-06 (B2 design)
- Backends have NO session parameter and perform NO DB writes — they are stateless signal-raisers only
</behavior>
<action>
Create backend/storage/google_drive_backend.py with:
@@ -136,7 +140,10 @@ Both backends need session + conn parameters for the refresh/update path (passed
from google.auth.transport.requests import Request
from storage.base import StorageBackend
class CloudConnectionError(Exception): pass
class CloudConnectionError(Exception):
def __init__(self, msg: str = "", *, reason: str = ""):
super().__init__(msg)
self.reason = reason # "token_expired" | "invalid_grant"
class GoogleDriveBackend(StorageBackend):
SCOPES = ["https://www.googleapis.com/auth/drive.file"]
@@ -243,8 +250,10 @@ print('All 7 methods are coroutines: OK')
- generate_presigned_put_url: raises NotImplementedError
- stat_object: GET /me/drive/items/{item_id}?$select=size; return int(response["size"])
- health_check: GET /me/drive?$select=id; return True/False
- _refresh_token(credentials: dict) -> dict | None: calls msal.ConfidentialClientApplication.acquire_token_by_refresh_token(); returns new credentials dict or None if result.get("error") == "invalid_grant"
- _refresh_token() -> dict | None: calls msal.ConfidentialClientApplication.acquire_token_by_refresh_token(); returns new credentials dict or None if result.get("error") == "invalid_grant"
- _ensure_valid_token(): on expired token calls _refresh_token(); if None raises CloudConnectionError(reason="invalid_grant"); if success updates self._credentials
- All sync msal calls wrapped in asyncio.to_thread(); httpx calls are already async (use await httpx.AsyncClient)
- Backend is stateless: raises CloudConnectionError(reason="token_expired") or CloudConnectionError(reason="invalid_grant") — no DB writes (B2 design; DB updates handled by API layer _call_cloud_op helper in cloud.py)
- CHUNK_SIZE = 10 * 1024 * 1024 (10 MB, above Graph's 4 MB limit)
</behavior>
<action>
@@ -390,8 +399,8 @@ cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest
</verification>
<success_criteria>
- GoogleDriveBackend: all 7 methods async; presigned methods raise NotImplementedError; CloudConnectionError defined
- OneDriveBackend: all 7 methods async; CHUNK_SIZE=10MB; presigned methods raise NotImplementedError; CloudConnectionError imported
- GoogleDriveBackend: all 7 methods async; presigned methods raise NotImplementedError; CloudConnectionError(reason=) defined; backend raises errors, does NO DB writes
- OneDriveBackend: all 7 methods async; CHUNK_SIZE=10MB; presigned methods raise NotImplementedError; CloudConnectionError imported; backend raises errors, does NO DB writes
- pytest -v exits 0, 0 failures; test_cloud.py still all xfailed
</success_criteria>