Files
kite/.planning/phases/05-cloud-storage-backends/05-06-PLAN.md
T
curo1305 d13801538d fix(05): revise Phase 5 plans based on checker feedback — B1-B4, W1-W4
B1: Mark RESEARCH.md Open Questions as (RESOLVED) with decision text for all 3
B2: Backends now stateless — raise CloudConnectionError(reason=) only; API layer
    in cloud.py owns token refresh + DB update via _call_cloud_op helper
B3: Add Task 3 to Plan 05 — cloud connection + object cleanup on account deletion (SEC-09)
B4: Add frontend_url setting to Plan 01 Task 1; Plan 05 uses settings.frontend_url
    for OAuth callback redirects
W1: ROADMAP.md Phase 5 now correctly labels Plans 03+04 as Wave 3 (not Wave 2)
W2: Plan 06 invalid_grant test now asserts both 503 HTTP response AND DB REQUIRES_REAUTH
W3: Plan 06 Task 2 split into unit tests (4, cloud_utils.py) and integration tests (11, HTTP)
W4: Plan 07 adds Vitest tests for cloudConnections store (4 tests) and SettingsCloudTab
    mount test (2 tests) per CLAUDE.md testing protocol

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 19:55:28 +02:00

21 KiB

phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, must_haves
phase plan type wave depends_on files_modified autonomous requirements must_haves
05-cloud-storage-backends 06 execute 5
05-05
backend/api/documents.py
backend/tests/test_cloud.py
true
CLOUD-03
CLOUD-05
CLOUD-07
truths artifacts key_links
POST /api/documents/upload detects active folder's backend and routes to cloud backend instead of presigned MinIO URL
GET /api/documents/{id}/content resolves the correct StorageBackend from document.storage_backend and streams bytes
invalid_grant during cloud upload/download transitions connection to REQUIRES_REAUTH without 500 error
All 15 test stubs in test_cloud.py have real assertions replacing pytest.xfail() calls
pytest tests/test_cloud.py passes with all 15 tests green (no xfailed, no failed)
path provides contains
backend/api/documents.py Extended upload + content endpoints supporting cloud backends get_storage_backend_for_document
path provides contains
backend/tests/test_cloud.py Full test suite for all Phase 5 requirements test_credential_round_trip
from to via pattern
backend/api/documents.py backend/storage/__init__.py get_storage_backend_for_document get_storage_backend_for_document
from to via pattern
backend/tests/test_cloud.py backend/api/cloud.py async_client HTTP calls to /api/cloud/* endpoints async_client
Wire cloud backends into the document upload and content proxy endpoints, and promote all 15 test stubs to real passing tests.

Purpose: Complete the storage backend integration — uploads routed to cloud when the active folder is a cloud provider, downloads routed through the correct backend per document.storage_backend. Then close the Nyquist loop by making all 15 xfail stubs pass. Output: Extended documents.py upload + content endpoints; fully passing test_cloud.py.

<execution_context> @/Users/nik/.claude/get-shit-done/workflows/execute-plan.md @/Users/nik/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/phases/05-cloud-storage-backends/05-CONTEXT.md @.planning/phases/05-cloud-storage-backends/05-RESEARCH.md @.planning/phases/05-cloud-storage-backends/05-05-SUMMARY.md From backend/storage/__init__.py: async def get_storage_backend_for_document(document, user, session: AsyncSession) -> StorageBackend def get_storage_backend() -> StorageBackend -- existing MinIO factory

POST /api/documents/upload: currently uses get_storage_backend() to generate presigned PUT URL GET /api/documents/{id}/content: currently calls backend.get_object(doc.object_key)

Document: storage_backend (String, "minio" for existing), object_key (Text), folder_id (UUID nullable) CloudConnection: user_id (UUID), provider (String), status (String)

CloudConnectionOut from api.admin

CloudConnectionError — raised when invalid_grant detected during cloud operation

cloud_connection_factory: async factory for creating CloudConnection rows mock_google_drive_creds: dict fixture mock_onedrive_creds: dict fixture mock_webdav_client: MagicMock fixture async_client: AsyncClient with db override db_session: SQLite in-memory session

All 15 test names and their requirement mappings

Task 1: Extend upload and content-proxy endpoints for cloud backends backend/api/documents.py - backend/api/documents.py — current POST /upload and GET /{id}/content implementations - backend/storage/__init__.py — get_storage_backend_for_document signature - backend/storage/google_drive_backend.py — CloudConnectionError exception class - backend/db/models.py — Document.storage_backend, Document.folder_id, CloudConnection - .planning/phases/05-cloud-storage-backends/05-CONTEXT.md — D-10 (cloud upload via FastAPI), D-14 (no presigned URL for cloud), D-15 (same content endpoint for all backends) - POST /api/documents/upload: detect target backend from request body field `target_backend` (str, default "minio"); if target_backend != "minio", read file bytes directly in the request handler (UploadFile.read()), call cloud_backend.put_object(), save Document with storage_backend=target_backend; if target_backend == "minio" keep existing presigned URL flow unmodified - GET /api/documents/{id}/content: replace direct get_storage_backend() call with get_storage_backend_for_document(document, current_user, session); handles all backends transparently - On CloudConnectionError from any cloud operation: return HTTP 503 with detail "Cloud connection requires re-authentication. Please reconnect in Settings." - Existing MinIO upload flow (presigned URL) is NOT modified — D-14 specifies generate_presigned_put_url raises NotImplementedError on cloud backends; upload endpoint detects cloud and uses direct path - document.storage_backend stored as: "minio", "google_drive", "onedrive", "nextcloud", or "webdav" - Quota: cloud uploads do NOT use the atomic quota UPDATE — cloud files are not counted against MinIO quota (D-11: they are separate backends) Read backend/api/documents.py fully before editing to understand current upload + content flow.
Modification 1 — POST /api/documents/upload:
Add optional `target_backend: str = Form("minio")` parameter to the upload endpoint.
If target_backend == "minio": existing presigned URL flow runs unchanged (return {"upload_url": presigned_url, "document_id": str(doc.id)}).
If target_backend in ("google_drive", "onedrive", "nextcloud", "webdav"):
  1. Read request body file bytes (file: UploadFile)
  2. Load CloudConnection for current_user.id + target_backend; 404 if not found or not ACTIVE
  3. Decrypt credentials via decrypt_credentials(settings.cloud_creds_key.encode(), str(current_user.id), conn.credentials_enc)
  4. Instantiate the correct backend from target_backend
  5. Call object_key = await cloud_backend.put_object(str(current_user.id), str(doc.id), file_bytes, extension, content_type)
  6. Create Document with storage_backend=target_backend, object_key=object_key, size_bytes=len(file_bytes)
  7. Return {"document_id": str(doc.id), "storage_backend": target_backend} — no upload_url (cloud upload is synchronous)
Catch CloudConnectionError from put_object → raise HTTPException(503)

Modification 2 — GET /api/documents/{id}/content:
Replace: `storage = get_storage_backend()`
With: `storage = await get_storage_backend_for_document(document, current_user, session)`
Import get_storage_backend_for_document from storage module.
Wrap with try/except CloudConnectionError → HTTPException(503, "Cloud connection requires re-authentication. Please reconnect in Settings.")

Add imports at top of documents.py (only if not already present):
from storage import get_storage_backend_for_document
from storage.google_drive_backend import CloudConnectionError
from storage.cloud_utils import decrypt_credentials
from config import settings
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c " import ast, sys with open('api/documents.py') as f: tree = ast.parse(f.read()) names = [n.id if isinstance(n, ast.Name) else getattr(n, 'attr', '') for n in ast.walk(tree) if isinstance(n, (ast.Name, ast.Attribute))] assert 'get_storage_backend_for_document' in names or True # import check print('documents.py parses without error: OK') " && python -m pytest -v --tb=short 2>&1 | tail -5 - backend/api/documents.py imports get_storage_backend_for_document from storage module - GET /api/documents/{id}/content uses get_storage_backend_for_document (not bare get_storage_backend() for all docs) - POST /api/documents/upload has target_backend parameter and cloud direct-upload path - CloudConnectionError caught and re-raised as HTTPException(503) - Existing MinIO upload flow (presigned URL) unchanged for target_backend="minio" - `pytest -v --tb=short` exits 0, 0 failures documents.py extended: upload detects cloud backend; content proxy uses get_storage_backend_for_document; CloudConnectionError → 503; existing MinIO flow unchanged Task 2: Promote unit test stubs to real tests (cloud_utils.py coverage) backend/tests/test_cloud.py - backend/tests/test_cloud.py — current xfail stubs - backend/storage/cloud_utils.py — validate_cloud_url, encrypt_credentials, decrypt_credentials - backend/storage/__init__.py — get_storage_backend_for_document - backend/storage/minio_backend.py — MinIOBackend class - 4 unit tests promoted; they test cloud_utils.py and the factory — no DB, no HTTP client, no network (W3 split: unit tests only) - test_credential_round_trip: pure unit test; calls encrypt_credentials + decrypt_credentials; asserts round-trip equals original; asserts ciphertext != plaintext string - test_ssrf_validation: @pytest.mark.parametrize over [("http://localhost/dav",True),("http://127.0.0.1/dav",True),("http://169.254.169.254/dav",True),("http://10.0.0.1/dav",True),("http://192.168.1.1/dav",True),("https://nextcloud.example.com/dav",False)]; asserts ValueError raised for private IPs; no exception for valid public URL - test_ssrf_link_local: calls validate_cloud_url("http://169.254.169.254/metadata"); asserts ValueError - test_factory_returns_correct_backend: constructs a mock Document(storage_backend="minio") and mock User; patches get_storage_backend() to return a MagicMock of MinIOBackend; calls get_storage_backend_for_document with a mock AsyncSession; asserts result is the expected backend type Promote the 4 unit-test stubs in test_cloud.py. These tests have no DB/HTTP dependencies:
1. test_credential_round_trip — no fixtures needed:
   from storage.cloud_utils import encrypt_credentials, decrypt_credentials
   master_key = b"test-master-key-32bytes-padded!!"
   user_id = "550e8400-e29b-41d4-a716-446655440000"
   creds = {"access_token": "ya29.xxx", "refresh_token": "1//xxx"}
   enc = encrypt_credentials(master_key, user_id, creds)
   assert isinstance(enc, str) and "access_token" not in enc
   dec = decrypt_credentials(master_key, user_id, enc)
   assert dec == creds

2. test_ssrf_validation — @pytest.mark.parametrize:
   All private/loopback/link-local URLs raise ValueError; valid public URL passes.
   Remove the xfail decorator; add parametrize decorator from behavior spec.

3. test_ssrf_link_local — simple unit test:
   from storage.cloud_utils import validate_cloud_url
   with pytest.raises(ValueError): validate_cloud_url("http://169.254.169.254/metadata")

4. test_factory_returns_correct_backend — mock-based unit test:
   from unittest.mock import MagicMock, AsyncMock, patch
   from storage import get_storage_backend_for_document
   Mock a Document with storage_backend="minio", a User, and an AsyncSession.
   Patch get_storage_backend() to return a MinIOBackend mock.
   Run asyncio.run(get_storage_backend_for_document(mock_doc, mock_user, mock_session)).
   Assert result is the patched MinIOBackend.

Remove @pytest.mark.xfail(strict=False) from all 4 stubs once implemented.
Leave the other 11 stubs with xfail decorators (they are promoted in Task 3).
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_cloud.py::test_credential_round_trip tests/test_cloud.py::test_ssrf_validation tests/test_cloud.py::test_ssrf_link_local tests/test_cloud.py::test_factory_returns_correct_backend -v 2>&1 | tail -10 - test_credential_round_trip, test_ssrf_validation, test_ssrf_link_local, test_factory_returns_correct_backend all PASSED - test_ssrf_validation is parametrized (multiple params visible in output) - No xfail decorators on these 4 tests - Other 11 tests still xfail (not broken by this task) - `pytest tests/test_cloud.py -v` exits 0 4 unit tests promoted to PASSED; cloud_utils.py coverage established; 11 integration stubs still xfailed Task 3: Promote integration test stubs to real passing tests (HTTP endpoint coverage) backend/tests/test_cloud.py - backend/tests/test_cloud.py — current xfail stubs (11 remaining after Task 2) - backend/tests/conftest.py — all fixtures including cloud_connection_factory, mock_google_drive_creds, async_client, db_session - backend/api/cloud.py — endpoint paths and request/response shapes - backend/api/admin.py — CloudConnectionOut fields - backend/db/models.py — CloudConnection, User, Document fields - .planning/phases/05-cloud-storage-backends/05-VALIDATION.md — test map with requirement → test correspondence - 11 integration tests promoted; all use async_client, db_session, and/or monkeypatch (W3 split: integration tests only) - test_credentials_enc_not_exposed: creates CloudConnection via cloud_connection_factory; calls GET /api/cloud/connections with valid auth; asserts "credentials_enc" not in response JSON at any level - test_cloud_upload_no_presigned: creates CloudConnection; mocks cloud backend put_object; calls POST /api/documents/upload with target_backend="google_drive"; asserts no "upload_url" in response - test_connection_status_display: creates ACTIVE CloudConnection; calls GET /api/cloud/connections; asserts response item has status == "ACTIVE" - test_invalid_grant_sets_requires_reauth: creates CloudConnection with status="ACTIVE"; monkey-patches the cloud backend operation to raise CloudConnectionError(reason="invalid_grant"); calls GET /api/documents/{id}/content; asserts 503 response; then re-queries the CloudConnection from DB and asserts connection.status == "REQUIRES_REAUTH" — both HTTP response AND DB state verified (W2 fix) - test_disconnect_deletes_credentials: creates CloudConnection; calls DELETE /api/cloud/connections/{id}; asserts 204; queries DB to confirm row deleted - test_admin_cannot_see_credentials: creates admin user + CloudConnection; calls GET /api/cloud/connections with admin auth; asserts 403 response - test_cross_user_idor: creates two users + CloudConnections; calls DELETE /api/cloud/connections/{user2_connection_id} with user1 auth; asserts 404 - test_connect_google_drive: calls GET /api/cloud/oauth/initiate/google_drive with valid auth; asserts 302 redirect containing "accounts.google.com" in location header - test_oauth_callback_valid_state: pre-seeds Redis with oauth_state key; mocks google_auth_oauthlib.flow.Flow.fetch_token; calls GET /api/cloud/oauth/callback/google_drive?code=test&state={seed_state}; asserts 302 redirect to /settings?cloud_connected=google_drive - test_oauth_callback_invalid_state: calls GET /api/cloud/oauth/callback/google_drive?code=x&state=invalid; asserts 400 - test_webdav_connect_validates: calls POST /api/cloud/connections/webdav with localhost URL; asserts 422 (SSRF blocked — validate_cloud_url raises ValueError before health check) Promote the 11 remaining xfail stubs in test_cloud.py to real integration tests.
Keep: all 11 test function names, all @pytest.mark.asyncio decorators.
Remove: @pytest.mark.xfail(strict=False) from all 11 stubs.
Add: proper fixture parameters (async_client, db_session, monkeypatch).

Auth helper (add as module-level async def or import from conftest):
  async def _create_user_and_token(session, role="user") — creates User row, generates JWT access token.
  Mirror pattern from existing test_auth_api.py or test_documents.py.

For Redis tests (test_connect_google_drive, test_oauth_callback_valid_state, test_oauth_callback_invalid_state):
  monkeypatch app.state.redis.setex, app.state.redis.get, app.state.redis.delete.
  test_oauth_callback_valid_state: pre-seed via monkeypatch return values; mock Flow.fetch_token.

For test_invalid_grant_sets_requires_reauth (W2 requirement):
  Create CloudConnection; monkeypatch get_storage_backend_for_document to raise
  CloudConnectionError(reason="invalid_grant"); call GET /api/documents/{id}/content;
  assert 503; then session.refresh(connection); assert connection.status == "REQUIRES_REAUTH".
  Note: the DB write of REQUIRES_REAUTH must actually be committed by _call_cloud_op — 
  test verifies the real DB state, not just the HTTP response.

For SDK mocking: monkeypatch or patch the SDK calls at the module import level.
All tests must pass under SQLite in-memory (non-INTEGRATION mode).
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_cloud.py -v 2>&1 - `pytest tests/test_cloud.py -v` exits 0 - Output shows all 15 tests PASSED (no xfailed, no FAILED, no ERROR) - test_invalid_grant_sets_requires_reauth: 503 HTTP response AND DB connection.status == "REQUIRES_REAUTH" (W2 + W3 combined) - test_credentials_enc_not_exposed: "credentials_enc" not present anywhere in response JSON - test_admin_cannot_see_credentials: 403 for admin role - test_cross_user_idor: 404 for cross-user connection access - `pytest -v --tb=short` (full suite) exits 0 with 0 failures All 15 test stubs promoted to real passing tests; pytest tests/test_cloud.py exits 0 with all PASSED; full suite exits 0

<threat_model>

Trust Boundaries

Boundary Description
UploadFile bytes → cloud backend File bytes from browser pass through FastAPI to cloud provider — no direct browser-to-cloud
document.storage_backend → backend factory storage_backend field from DB (not user input) determines which backend loads
CloudConnectionError → HTTP 503 Provider rejection must surface as 503, not 500 (stack trace) or silent retry

STRIDE Threat Register

Threat ID Category Component Disposition Mitigation Plan
T-05-06-01 Spoofing target_backend form field tampering mitigate target_backend validated against VALID_PROVIDERS set; invalid values return 422; CloudConnection load asserts user ownership before use
T-05-06-02 Information Disclosure CloudConnectionError message in 503 mitigate 503 detail = "Cloud connection requires re-authentication. Please reconnect in Settings." — no provider error detail or token info in response
T-05-06-03 Denial of Service Cloud upload quota bypass accept Cloud uploads do not consume MinIO quota (D-11: separate backends); cloud storage quotas are provider-side — not DocuVault's responsibility in v1
T-05-06-04 Tampering Test mocks hiding real failures mitigate Tests mock at the boundary (SDK calls), not at the function level; behavior assertions check HTTP response codes and DB state, not implementation details
</threat_model>
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_cloud.py -v && python -m pytest -v --tb=short 2>&1 | tail -10

<success_criteria>

  • POST /api/documents/upload: target_backend routing works for cloud backends; MinIO flow unchanged
  • GET /api/documents/{id}/content: uses get_storage_backend_for_document; CloudConnectionError → 503
  • test_cloud.py Task 2 (unit): test_credential_round_trip, test_ssrf_validation, test_ssrf_link_local, test_factory_returns_correct_backend all PASSED
  • test_cloud.py Task 3 (integration): all 11 integration stubs PASSED including REQUIRES_REAUTH DB assertion
  • test_cloud.py: all 15 tests PASSED; no xfailed
  • pytest -v (full suite): exits 0, 0 failures </success_criteria>
Create `.planning/phases/05-cloud-storage-backends/05-06-SUMMARY.md` when done