Files
curo1305 d13801538d fix(05): revise Phase 5 plans based on checker feedback — B1-B4, W1-W4
B1: Mark RESEARCH.md Open Questions as (RESOLVED) with decision text for all 3
B2: Backends now stateless — raise CloudConnectionError(reason=) only; API layer
    in cloud.py owns token refresh + DB update via _call_cloud_op helper
B3: Add Task 3 to Plan 05 — cloud connection + object cleanup on account deletion (SEC-09)
B4: Add frontend_url setting to Plan 01 Task 1; Plan 05 uses settings.frontend_url
    for OAuth callback redirects
W1: ROADMAP.md Phase 5 now correctly labels Plans 03+04 as Wave 3 (not Wave 2)
W2: Plan 06 invalid_grant test now asserts both 503 HTTP response AND DB REQUIRES_REAUTH
W3: Plan 06 Task 2 split into unit tests (4, cloud_utils.py) and integration tests (11, HTTP)
W4: Plan 07 adds Vitest tests for cloudConnections store (4 tests) and SettingsCloudTab
    mount test (2 tests) per CLAUDE.md testing protocol

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 19:55:28 +02:00

25 KiB

phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, must_haves
phase plan type wave depends_on files_modified autonomous requirements must_haves
05-cloud-storage-backends 05 execute 4
05-03
05-04
backend/api/cloud.py
backend/main.py
backend/api/auth.py
true
CLOUD-01
CLOUD-02
CLOUD-03
CLOUD-04
CLOUD-05
CLOUD-06
SEC-09
truths artifacts key_links
GET /api/cloud/oauth/initiate/{provider} redirects to provider OAuth URL; state token in Redis with 30-min TTL
GET /api/cloud/oauth/callback/{provider} validates state, exchanges code, encrypts credentials, saves CloudConnection, redirects to {settings.frontend_url}/settings?cloud_connected={provider}
POST /api/cloud/connections/webdav validates URL (SSRF), tests connection (PROPFIND), encrypts + saves credentials
GET /api/cloud/connections returns CloudConnectionOut list — no credentials_enc
DELETE /api/cloud/connections/{id} deletes credentials_enc row; subsequent use returns 503
GET /api/cloud/folders/{provider}/{folder_id} returns lazy-loaded folder listing (TTL-cached)
PATCH /api/users/me/default-storage updates users.default_storage_backend
All endpoints use get_regular_user dep — admin blocked (403)
OAuth callback invalid state returns 400; invalid provider returns 400
write_audit_log called on connect, disconnect, and REQUIRES_REAUTH transitions
_call_cloud_op(conn, user, session, op_fn) helper in cloud.py wraps all cloud ops: retries once on token_expired (refresh+DB update), sets REQUIRES_REAUTH+HTTPException(503) on invalid_grant
Account deletion purges all CloudConnection rows and calls delete_object on cloud-stored documents (SEC-09)
path provides contains
backend/api/cloud.py All /api/cloud/* endpoints + /api/users/me/default-storage router = APIRouter
path provides contains
backend/main.py cloud router registered cloud_router
from to via pattern
backend/api/cloud.py backend/storage/cloud_utils.py encrypt_credentials / decrypt_credentials encrypt_credentials
from to via pattern
backend/api/cloud.py backend/api/admin.py CloudConnectionOut Pydantic model import CloudConnectionOut
from to via pattern
backend/api/cloud.py backend/services/audit.py write_audit_log on connect/disconnect write_audit_log
Create backend/api/cloud.py with all cloud connection management endpoints and register it in main.py.

Purpose: This plan implements the complete cloud backend API surface: OAuth initiation, OAuth callback, WebDAV connect, list connections, disconnect, folder listing, and default-storage selection. Output: backend/api/cloud.py with 7 endpoints + 1 patch endpoint; main.py updated to register the router.

<execution_context> @/Users/nik/.claude/get-shit-done/workflows/execute-plan.md @/Users/nik/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/phases/05-cloud-storage-backends/05-CONTEXT.md @.planning/phases/05-cloud-storage-backends/05-RESEARCH.md @.planning/phases/05-cloud-storage-backends/05-03-SUMMARY.md @.planning/phases/05-cloud-storage-backends/05-04-SUMMARY.md From backend/api/admin.py: class CloudConnectionOut(BaseModel): id: str provider: str display_name: str status: str connected_at: datetime model_config = {"from_attributes": True}

From backend/deps/auth.py: async def get_regular_user(credentials, session) -> User: -- raises 403 for admin, 401 for invalid token

From backend/services/audit.py: async def write_audit_log(session, event_type, user_id, actor_id, resource_id, ip_address, metadata_=None) -> None

From backend/db/models.py: CloudConnection: id (UUID), user_id (UUID), provider (String), display_name (Text), credentials_enc (Text), status (String, default="ACTIVE"), connected_at (TIMESTAMP) User: id (UUID), default_storage_backend (String, default="minio")

From backend/config.py (after Plan 01): settings.cloud_creds_key: str settings.google_client_id, google_client_secret: str settings.onedrive_client_id, onedrive_client_secret, onedrive_tenant_id: str settings.backend_url: str (used in OAuth callback redirect_uri) settings.frontend_url: str (used in OAuth callback success/error redirect to Vue — per B4 fix)

From backend/storage/cloud_utils.py: def encrypt_credentials(master_key: bytes, user_id: str, credentials: dict) -> str def decrypt_credentials(master_key: bytes, user_id: str, credentials_enc: str) -> dict def validate_cloud_url(url: str) -> None

From RESEARCH.md Pattern 3: Google Drive OAuth — Flow.from_client_config, access_type="offline", prompt="consent" From RESEARCH.md Pattern 4: OneDrive OAuth — msal.ConfidentialClientApplication, acquire_token_by_authorization_code From RESEARCH.md Pattern 7: OAuth state in Redis — key "oauth_state:{state_token}", TTL 1800, single-use delete

From backend/storage/nextcloud_backend.py: NextcloudBackend.list_folder() -> list[dict] From backend/services/cloud_cache.py: get_cloud_folders_cached(user_id, provider, folder_id, fetch_fn)

Task 1: Create cloud.py with OAuth + WebDAV connect + connection management endpoints backend/api/cloud.py - backend/api/admin.py — CloudConnectionOut pattern, _user_to_dict whitelist style, write_audit_log usage - backend/api/auth.py — Redis state pattern (oauth_state-like keys), rate limiting pattern - backend/deps/auth.py — get_regular_user signature - backend/db/models.py — CloudConnection, User model fields - backend/config.py — new Phase 5 settings fields - .planning/phases/05-cloud-storage-backends/05-CONTEXT.md — D-03, D-04, D-06, D-17, D-18, D-19 decisions - .planning/phases/05-cloud-storage-backends/05-RESEARCH.md — Pattern 3 (Google OAuth), Pattern 4 (MSAL), Pattern 7 (Redis state) - GET /api/cloud/oauth/initiate/{provider}: accepts provider in {"google_drive", "onedrive"}; generates state_token = secrets.token_urlsafe(32); stores "oauth_state:{state_token}" in Redis with value str(current_user.id), TTL 1800; builds authorization_url; returns HTTP 302 redirect to authorization_url - GET /api/cloud/oauth/callback/{provider}: reads state and code query params; looks up Redis key "oauth_state:{state}"; if missing, returns 400; deletes Redis key (single-use); exchanges code for tokens; encrypts credentials; upserts CloudConnection (match on user_id + provider); sets status=ACTIVE; calls write_audit_log(event_type="cloud.connected"); returns 302 redirect to {settings.frontend_url}/settings?cloud_connected={provider} - On any exception in callback: returns 302 redirect to {settings.frontend_url}/settings?cloud_error={url-encoded error message} - POST /api/cloud/connections/webdav: Pydantic body with server_url (HttpUrl), username (str), password (str), provider (Literal["nextcloud", "webdav"]); calls validate_cloud_url(server_url) → 422 on ValueError; instantiates WebDAVBackend/NextcloudBackend; calls backend.health_check() wrapped in try/except → 422 if False/exception; encrypts credentials; upserts CloudConnection; calls write_audit_log(event_type="cloud.connected"); returns CloudConnectionOut - GET /api/cloud/connections: selects all CloudConnection where user_id=current_user.id; returns {"items": [CloudConnectionOut, ...]}; credentials_enc never in response - DELETE /api/cloud/connections/{id}: loads CloudConnection; asserts connection.user_id == current_user.id (returns 404 if mismatch — prevents ID enumeration per D-19); deletes row; calls write_audit_log(event_type="cloud.disconnected"); returns 204 - PATCH /api/users/me/default-storage: body {"backend": str}; updates User.default_storage_backend; returns {"default_storage_backend": new_value} - ALL endpoints: Depends(get_regular_user) — admin blocked (D-18, D-19) - ALL endpoints: cross-user access returns 404 not 403 (prevents ID enumeration) Create backend/api/cloud.py with module docstring listing all endpoints and security invariants.
Imports: secrets, uuid, urllib.parse, from fastapi import APIRouter, Depends, HTTPException, Request, status, from fastapi.responses import RedirectResponse, from pydantic import BaseModel, HttpUrl, Literal, from sqlalchemy import select, from sqlalchemy.ext.asyncio import AsyncSession

From project modules:
  from api.admin import CloudConnectionOut
  from config import settings
  from db.models import CloudConnection, User
  from deps.auth import get_regular_user
  from deps.db import get_db
  from services.audit import write_audit_log
  from storage.cloud_utils import encrypt_credentials, decrypt_credentials, validate_cloud_url

VALID_OAUTH_PROVIDERS = {"google_drive", "onedrive"}
VALID_WEBDAV_PROVIDERS = {"nextcloud", "webdav"}

router = APIRouter(prefix="/api/cloud", tags=["cloud"])
users_router = APIRouter(prefix="/api/users", tags=["users"])

_call_cloud_op helper (add as a module-level async function in cloud.py, per B2 design):
async def _call_cloud_op(conn: CloudConnection, user: User, session: AsyncSession, op_fn):
    """Wraps a cloud operation with transparent token refresh (D-05) and invalid_grant handling (D-06).
    
    1. Calls op_fn() — a zero-argument async callable that performs the cloud operation.
    2. On CloudConnectionError(reason="token_expired"): decrypt current creds, refresh via provider,
       encrypt new creds, update conn.credentials_enc in DB, rebuild backend, retry op_fn() once.
    3. On CloudConnectionError(reason="invalid_grant"): set conn.status="REQUIRES_REAUTH",
       await session.commit(), call write_audit_log(event_type="cloud.requires_reauth"), 
       raise HTTPException(503, "Cloud connection requires re-authentication. Please reconnect in Settings.").
    4. Propagates all other exceptions unchanged.
    """
    All upload/download/list calls in cloud.py MUST go through _call_cloud_op.
    op_fn is a zero-argument async lambda that already has the backend instance captured in closure.
    The backend instance is rebuilt after refresh using the new credentials dict.

Pydantic request models:
  class WebDAVConnectRequest(BaseModel): server_url: str; username: str; password: str; provider: str
  class DefaultStorageRequest(BaseModel): backend: str

Implement all 6 cloud endpoints + 1 users/me/default-storage endpoint per the behavior spec above.

For Google Drive OAuth initiate/callback:
  from google_auth_oauthlib.flow import Flow (lazy import inside handler)
  Flow.from_client_config with client_id=settings.google_client_id, client_secret=settings.google_client_secret
  Scopes: ["https://www.googleapis.com/auth/drive.file"]
  redirect_uri = f"{settings.backend_url}/api/cloud/oauth/callback/google_drive"
  flow.authorization_url(access_type="offline", prompt="consent")
  At callback: flow.fetch_token(code=code); store access_token, refresh_token, expiry, token_uri, client_id, client_secret

For OneDrive OAuth initiate/callback:
  import msal (lazy import inside handler)
  msal.ConfidentialClientApplication(settings.onedrive_client_id, client_credential=settings.onedrive_client_secret, authority=f"https://login.microsoftonline.com/{settings.onedrive_tenant_id}")
  app.get_authorization_request_url(scopes=["Files.ReadWrite","offline_access"], redirect_uri=..., state=state_token)
  At callback: app.acquire_token_by_authorization_code(code, scopes=..., redirect_uri=...)
  Wrap msal calls in asyncio.to_thread()

For WebDAV/Nextcloud connect:
  from storage.webdav_backend import WebDAVBackend
  from storage.nextcloud_backend import NextcloudBackend
  Instantiate with try/except ValueError → HTTPException(422)
  health_check() in asyncio.to_thread context; on False → HTTPException(422, "Connection test failed — check server URL and credentials")

Upsert logic for CloudConnection:
  SELECT where user_id=current_user.id AND provider=provider
  If exists: update credentials_enc + status=ACTIVE; if not exists: INSERT
  display_name = human-readable from provider: {"google_drive": "Google Drive", "onedrive": "OneDrive", "nextcloud": "Nextcloud", "webdav": "WebDAV server"}

write_audit_log calls:
  cloud.connected: user_id=current_user.id, actor_id=current_user.id, resource_id=conn.id, metadata_={"provider": provider}
  cloud.disconnected: same pattern
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c " from api.cloud import router, users_router print('cloud router imports OK') print('Routes:') for route in router.routes: print(f' {route.methods} {route.path}') for route in users_router.routes: print(f' {route.methods} {route.path}') " - backend/api/cloud.py exists and imports without error - router has routes: GET /oauth/initiate/{provider}, GET /oauth/callback/{provider}, POST /connections/webdav, GET /connections, DELETE /connections/{id}, GET /folders/{provider}/{folder_id} - users_router has route: PATCH /me/default-storage - All handlers have `Depends(get_regular_user)` in their signature - CloudConnectionOut imported from api.admin — not redefined - credentials_enc column never referenced in any response serialization (only in CloudConnection ORM SELECT for encrypt/decrypt ops) - `pytest -v --tb=short` exits 0 cloud.py created with all 7 endpoints; all use get_regular_user dep; CloudConnectionOut from admin module; pytest passes Task 2: Register cloud router in main.py + add folder listing endpoint backend/main.py, backend/api/cloud.py - backend/main.py — existing router registrations pattern (app.include_router) - backend/api/cloud.py — router and users_router objects created in Task 1 - backend/services/cloud_cache.py — get_cloud_folders_cached signature - backend/storage/nextcloud_backend.py — NextcloudBackend.list_folder signature - backend/storage/webdav_backend.py — WebDAVBackend.list_folder (may not exist — use generic approach) - GET /api/cloud/folders/{provider}/{folder_id} endpoint added to cloud.py router: loads CloudConnection, decrypts credentials, instantiates backend, calls backend-specific list method via get_cloud_folders_cached; returns {"items": [...]} where each item has id, name, is_dir, size - main.py includes both cloud router and users_router from api.cloud - Router registrations added in alphabetical order with other routers (after folders, before shares) - Existing test suite passes after router registration In backend/api/cloud.py, add the folder listing endpoint to the router (if not already added in Task 1):
GET /api/cloud/folders/{provider}/{folder_id} implementation:
- Load CloudConnection for current_user.id + provider; 404 if not found or status != ACTIVE
- Decrypt credentials
- Build a fetch_fn async lambda that calls backend.list_folder(folder_id or root path)
- For provider "google_drive": use Drive service.files().list(q=f"'{folder_id}' in parents", fields="files(id,name,mimeType,size)"); convert to standard format
- For provider "onedrive": GET /me/drive/items/{folder_id}/children; convert to standard format
- For provider in ("nextcloud", "webdav"): instantiate NextcloudBackend; call list_folder(folder_id)
- Wrap in get_cloud_folders_cached(str(current_user.id), provider, folder_id, fetch_fn)
- Return {"items": [{"id":..., "name":..., "is_dir":bool, "size":int}, ...]}

In backend/main.py:
- Add imports: from api.cloud import router as cloud_router, users_router as cloud_users_router
- Add app.include_router(cloud_router) and app.include_router(cloud_users_router) after the existing router includes
- The existing routers (documents, topics, auth, admin, folders, audit, shares) must remain unchanged
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c " from main import app cloud_routes = [r.path for r in app.routes if hasattr(r, 'path') and '/api/cloud' in r.path] default_storage = [r.path for r in app.routes if hasattr(r, 'path') and 'default-storage' in r.path] print('Cloud routes registered:', cloud_routes) print('Default storage route:', default_storage) assert len(cloud_routes) >= 5, f'Expected 5+ cloud routes, got {len(cloud_routes)}' " && python -m pytest -v --tb=short 2>&1 | tail -5 - main.py imports and includes cloud_router and cloud_users_router - `from main import app; [r.path for r in app.routes]` includes paths matching /api/cloud/ and /api/users/me/default-storage - At least 6 cloud routes registered (initiate, callback, webdav, connections GET, connections DELETE, folders) - `pytest -v --tb=short` exits 0, 0 failures - Existing routes (documents, auth, admin, folders, shares) still reachable Both cloud routers registered in main.py; all cloud routes visible in app.routes; full pytest suite passes Task 3: Cloud connection cleanup on account deletion (SEC-09) backend/api/auth.py - backend/api/auth.py — find the DELETE /api/users/me endpoint (account self-deletion), verify it exists from Phase 2; if it does not exist, check backend/api/admin.py for DELETE /api/admin/users/{id} - backend/db/models.py — CloudConnection (user_id, provider, status), Document (user_id, storage_backend, object_key) - backend/storage/__init__.py — get_storage_backend_for_document signature - When a user deletes their account (DELETE /api/users/me or admin DELETE /api/admin/users/{id}): 1. Query all CloudConnection rows for the user 2. For each connection, query all Document rows for that user where storage_backend == connection.provider 3. For each such document, call get_storage_backend_for_document(doc, user, session) and await backend.delete_object(doc.object_key) — catch and log exceptions but do NOT abort the deletion 4. Delete all CloudConnection rows for the user (credentials_enc purged) - This runs BEFORE the user row is deleted (FK cascade would remove connections anyway, but credentials must be actively purged from the cloud provider) - Runs in the same DB transaction as user deletion — if user deletion succeeds, cloud cleanup has completed - No orphaned credentials_enc rows after account deletion (SEC-09) Read backend/api/auth.py to locate the account deletion endpoint. Also check backend/api/admin.py for admin-initiated user deletion.
In the account deletion handler (DELETE /api/users/me), add a cloud cleanup block BEFORE the user row deletion:

1. Import at top of file (if not already present):
   from db.models import CloudConnection, Document
   from storage import get_storage_backend_for_document
   from sqlalchemy import select

2. Cloud cleanup block (insert before the DELETE user statement):
   cloud_conns_result = await session.execute(
       select(CloudConnection).where(CloudConnection.user_id == current_user.id)
   )
   cloud_conns = cloud_conns_result.scalars().all()
   for conn in cloud_conns:
       # Delete cloud objects for this provider
       docs_result = await session.execute(
           select(Document).where(
               Document.user_id == current_user.id,
               Document.storage_backend == conn.provider,
           )
       )
       for doc in docs_result.scalars().all():
           try:
               backend = await get_storage_backend_for_document(doc, current_user, session)
               await backend.delete_object(doc.object_key)
           except Exception:
               pass  # Do not abort user deletion on cloud error
       await session.delete(conn)
   await session.flush()  # Flush connection deletes before user delete

If DELETE /api/users/me does not exist in auth.py, check admin.py for the admin-delete endpoint and add the same cleanup block there. Document which file was modified in the summary.

write_audit_log call: add event_type="cloud.credentials_purged" after the cleanup loop,
with metadata_={"providers": [c.provider for c in cloud_conns]}.
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c " import ast import os for fname in ['api/auth.py', 'api/admin.py']: if os.path.exists(fname): with open(fname) as f: src = f.read() if 'cloud_conns' in src or 'CloudConnection' in src: print(f'OK: cloud cleanup found in {fname}') " && python -m pytest -v --tb=short 2>&1 | tail -5 - Either backend/api/auth.py or backend/api/admin.py contains cloud connection cleanup logic before user deletion - CloudConnection rows are deleted for the user as part of account deletion - delete_object called for each cloud-stored document before credentials are purged - write_audit_log called with event_type="cloud.credentials_purged" - pytest -v exits 0 with 0 failures - No orphaned credentials_enc rows after account deletion (SEC-09) Cloud connection cleanup wired into account deletion; credentials_enc purged; SEC-09 satisfied

<threat_model>

Trust Boundaries

Boundary Description
OAuth callback → user session state parameter validates callback belongs to the initiating user
API request → CloudConnection row connection.user_id == current_user.id assertion prevents IDOR
WebDAV credentials → validation credentials only stored after successful health_check()
API response → CloudConnectionOut credentials_enc excluded by CloudConnectionOut whitelist

STRIDE Threat Register

Threat ID Category Component Disposition Mitigation Plan
T-05-05-01 Tampering OAuth callback CSRF mitigate secrets.token_urlsafe(32) state token stored in Redis; validated at callback; single-use deletion after validation (D-04)
T-05-05-02 Elevation of Privilege OAuth callback state token leak mitigate Redis TTL 1800s (30 min); key deleted after single use; state token never returned to browser
T-05-05-03 Information Disclosure CloudConnectionOut in API responses mitigate CloudConnectionOut imported from admin.py — exact same whitelist; credentials_enc absent by omission (SEC-08)
T-05-05-04 Information Disclosure Cloud connection ID enumeration mitigate DELETE /connections/{id} returns 404 for wrong-owner connections — same pattern as documents and shares (T-04-04-02)
T-05-05-05 Tampering WebDAV server_url SSRF mitigate validate_cloud_url called before WebDAVBackend/NextcloudBackend instantiation; also called in init and before each request (D-17 defense-in-depth)
T-05-05-06 Spoofing Admin access to cloud endpoints mitigate get_regular_user raises 403 for admin role on all cloud endpoints (D-18)
T-05-05-07 Information Disclosure OAuth error message in redirect URL accept Error message in ?cloud_error= is URL-encoded and displayed to the authenticated user only; no PII or secret value included
T-05-05-08 Information Disclosure write_audit_log metadata for cloud.connected mitigate Audit metadata_ = {"provider": provider} only — no credentials, no tokens, no plaintext password (aligns with document audit whitelist pattern)
</threat_model>
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_cloud.py -v && python -m pytest -v --tb=short 2>&1 | tail -10

<success_criteria>

  • cloud.py: all 7 endpoints implemented; all use get_regular_user dep; cross-user returns 404; write_audit_log on connect/disconnect
  • main.py: both routers registered; all routes visible in app.routes
  • pytest -v exits 0, 0 failures
  • test_cloud.py stubs transition from xfail to green for test_credentials_enc_not_exposed, test_connection_status_display, test_disconnect_deletes_credentials, test_ssrf_validation, test_cross_user_idor, test_admin_cannot_see_credentials
  • SEC-09: account deletion endpoint purges CloudConnection rows and cloud-stored document objects before deleting user row </success_criteria>
Create `.planning/phases/05-cloud-storage-backends/05-05-SUMMARY.md` when done