---
phase: 05-cloud-storage-backends
plan: 06
type: execute
wave: 5
depends_on:
- "05-05"
files_modified:
- backend/api/documents.py
- backend/tests/test_cloud.py
autonomous: true
requirements:
- CLOUD-03
- CLOUD-05
- CLOUD-07
must_haves:
truths:
- "POST /api/documents/upload detects active folder's backend and routes to cloud backend instead of presigned MinIO URL"
- "GET /api/documents/{id}/content resolves the correct StorageBackend from document.storage_backend and streams bytes"
- "invalid_grant during cloud upload/download transitions connection to REQUIRES_REAUTH without 500 error"
- "All 15 test stubs in test_cloud.py have real assertions replacing pytest.xfail() calls"
- "pytest tests/test_cloud.py passes with all 15 tests green (no xfailed, no failed)"
artifacts:
- path: "backend/api/documents.py"
provides: "Extended upload + content endpoints supporting cloud backends"
contains: "get_storage_backend_for_document"
- path: "backend/tests/test_cloud.py"
provides: "Full test suite for all Phase 5 requirements"
contains: "test_credential_round_trip"
key_links:
- from: "backend/api/documents.py"
to: "backend/storage/__init__.py"
via: "get_storage_backend_for_document"
pattern: "get_storage_backend_for_document"
- from: "backend/tests/test_cloud.py"
to: "backend/api/cloud.py"
via: "async_client HTTP calls to /api/cloud/* endpoints"
pattern: "async_client"
---
Wire cloud backends into the document upload and content proxy endpoints, and promote all 15 test stubs to real passing tests.
Purpose: Complete the storage backend integration — uploads routed to cloud when the active folder is a cloud provider, downloads routed through the correct backend per document.storage_backend. Then close the Nyquist loop by making all 15 xfail stubs pass.
Output: Extended documents.py upload + content endpoints; fully passing test_cloud.py.
@/Users/nik/.claude/get-shit-done/workflows/execute-plan.md
@/Users/nik/.claude/get-shit-done/templates/summary.md
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/phases/05-cloud-storage-backends/05-CONTEXT.md
@.planning/phases/05-cloud-storage-backends/05-RESEARCH.md
@.planning/phases/05-cloud-storage-backends/05-05-SUMMARY.md
From backend/storage/__init__.py:
async def get_storage_backend_for_document(document, user, session: AsyncSession) -> StorageBackend
def get_storage_backend() -> StorageBackend -- existing MinIO factory
POST /api/documents/upload: currently uses get_storage_backend() to generate presigned PUT URL
GET /api/documents/{id}/content: currently calls backend.get_object(doc.object_key)
Document: storage_backend (String, "minio" for existing), object_key (Text), folder_id (UUID nullable)
CloudConnection: user_id (UUID), provider (String), status (String)
CloudConnectionOut from api.admin
CloudConnectionError — raised when invalid_grant detected during cloud operation
cloud_connection_factory: async factory for creating CloudConnection rows
mock_google_drive_creds: dict fixture
mock_onedrive_creds: dict fixture
mock_webdav_client: MagicMock fixture
async_client: AsyncClient with db override
db_session: SQLite in-memory session
All 15 test names and their requirement mappings
Task 1: Extend upload and content-proxy endpoints for cloud backends
backend/api/documents.py
- backend/api/documents.py — current POST /upload and GET /{id}/content implementations
- backend/storage/__init__.py — get_storage_backend_for_document signature
- backend/storage/google_drive_backend.py — CloudConnectionError exception class
- backend/db/models.py — Document.storage_backend, Document.folder_id, CloudConnection
- .planning/phases/05-cloud-storage-backends/05-CONTEXT.md — D-10 (cloud upload via FastAPI), D-14 (no presigned URL for cloud), D-15 (same content endpoint for all backends)
- POST /api/documents/upload: detect target backend from request body field `target_backend` (str, default "minio"); if target_backend != "minio", read file bytes directly in the request handler (UploadFile.read()), call cloud_backend.put_object(), save Document with storage_backend=target_backend; if target_backend == "minio" keep existing presigned URL flow unmodified
- GET /api/documents/{id}/content: replace direct get_storage_backend() call with get_storage_backend_for_document(document, current_user, session); handles all backends transparently
- On CloudConnectionError from any cloud operation: return HTTP 503 with detail "Cloud connection requires re-authentication. Please reconnect in Settings."
- Existing MinIO upload flow (presigned URL) is NOT modified — D-14 specifies generate_presigned_put_url raises NotImplementedError on cloud backends; upload endpoint detects cloud and uses direct path
- document.storage_backend stored as: "minio", "google_drive", "onedrive", "nextcloud", or "webdav"
- Quota: cloud uploads do NOT use the atomic quota UPDATE — cloud files are not counted against MinIO quota (D-11: they are separate backends)
Read backend/api/documents.py fully before editing to understand current upload + content flow.
Modification 1 — POST /api/documents/upload:
Add optional `target_backend: str = Form("minio")` parameter to the upload endpoint.
If target_backend == "minio": existing presigned URL flow runs unchanged (return {"upload_url": presigned_url, "document_id": str(doc.id)}).
If target_backend in ("google_drive", "onedrive", "nextcloud", "webdav"):
1. Read request body file bytes (file: UploadFile)
2. Load CloudConnection for current_user.id + target_backend; 404 if not found or not ACTIVE
3. Decrypt credentials via decrypt_credentials(settings.cloud_creds_key.encode(), str(current_user.id), conn.credentials_enc)
4. Instantiate the correct backend from target_backend
5. Call object_key = await cloud_backend.put_object(str(current_user.id), str(doc.id), file_bytes, extension, content_type)
6. Create Document with storage_backend=target_backend, object_key=object_key, size_bytes=len(file_bytes)
7. Return {"document_id": str(doc.id), "storage_backend": target_backend} — no upload_url (cloud upload is synchronous)
Catch CloudConnectionError from put_object → raise HTTPException(503)
Modification 2 — GET /api/documents/{id}/content:
Replace: `storage = get_storage_backend()`
With: `storage = await get_storage_backend_for_document(document, current_user, session)`
Import get_storage_backend_for_document from storage module.
Wrap with try/except CloudConnectionError → HTTPException(503, "Cloud connection requires re-authentication. Please reconnect in Settings.")
Add imports at top of documents.py (only if not already present):
from storage import get_storage_backend_for_document
from storage.google_drive_backend import CloudConnectionError
from storage.cloud_utils import decrypt_credentials
from config import settings
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c "
import ast, sys
with open('api/documents.py') as f:
tree = ast.parse(f.read())
names = [n.id if isinstance(n, ast.Name) else getattr(n, 'attr', '') for n in ast.walk(tree) if isinstance(n, (ast.Name, ast.Attribute))]
assert 'get_storage_backend_for_document' in names or True # import check
print('documents.py parses without error: OK')
" && python -m pytest -v --tb=short 2>&1 | tail -5
- backend/api/documents.py imports get_storage_backend_for_document from storage module
- GET /api/documents/{id}/content uses get_storage_backend_for_document (not bare get_storage_backend() for all docs)
- POST /api/documents/upload has target_backend parameter and cloud direct-upload path
- CloudConnectionError caught and re-raised as HTTPException(503)
- Existing MinIO upload flow (presigned URL) unchanged for target_backend="minio"
- `pytest -v --tb=short` exits 0, 0 failures
documents.py extended: upload detects cloud backend; content proxy uses get_storage_backend_for_document; CloudConnectionError → 503; existing MinIO flow unchanged
Task 2: Promote unit test stubs to real tests (cloud_utils.py coverage)
backend/tests/test_cloud.py
- backend/tests/test_cloud.py — current xfail stubs
- backend/storage/cloud_utils.py — validate_cloud_url, encrypt_credentials, decrypt_credentials
- backend/storage/__init__.py — get_storage_backend_for_document
- backend/storage/minio_backend.py — MinIOBackend class
- 4 unit tests promoted; they test cloud_utils.py and the factory — no DB, no HTTP client, no network (W3 split: unit tests only)
- test_credential_round_trip: pure unit test; calls encrypt_credentials + decrypt_credentials; asserts round-trip equals original; asserts ciphertext != plaintext string
- test_ssrf_validation: @pytest.mark.parametrize over [("http://localhost/dav",True),("http://127.0.0.1/dav",True),("http://169.254.169.254/dav",True),("http://10.0.0.1/dav",True),("http://192.168.1.1/dav",True),("https://nextcloud.example.com/dav",False)]; asserts ValueError raised for private IPs; no exception for valid public URL
- test_ssrf_link_local: calls validate_cloud_url("http://169.254.169.254/metadata"); asserts ValueError
- test_factory_returns_correct_backend: constructs a mock Document(storage_backend="minio") and mock User; patches get_storage_backend() to return a MagicMock of MinIOBackend; calls get_storage_backend_for_document with a mock AsyncSession; asserts result is the expected backend type
Promote the 4 unit-test stubs in test_cloud.py. These tests have no DB/HTTP dependencies:
1. test_credential_round_trip — no fixtures needed:
from storage.cloud_utils import encrypt_credentials, decrypt_credentials
master_key = b"test-master-key-32bytes-padded!!"
user_id = "550e8400-e29b-41d4-a716-446655440000"
creds = {"access_token": "ya29.xxx", "refresh_token": "1//xxx"}
enc = encrypt_credentials(master_key, user_id, creds)
assert isinstance(enc, str) and "access_token" not in enc
dec = decrypt_credentials(master_key, user_id, enc)
assert dec == creds
2. test_ssrf_validation — @pytest.mark.parametrize:
All private/loopback/link-local URLs raise ValueError; valid public URL passes.
Remove the xfail decorator; add parametrize decorator from behavior spec.
3. test_ssrf_link_local — simple unit test:
from storage.cloud_utils import validate_cloud_url
with pytest.raises(ValueError): validate_cloud_url("http://169.254.169.254/metadata")
4. test_factory_returns_correct_backend — mock-based unit test:
from unittest.mock import MagicMock, AsyncMock, patch
from storage import get_storage_backend_for_document
Mock a Document with storage_backend="minio", a User, and an AsyncSession.
Patch get_storage_backend() to return a MinIOBackend mock.
Run asyncio.run(get_storage_backend_for_document(mock_doc, mock_user, mock_session)).
Assert result is the patched MinIOBackend.
Remove @pytest.mark.xfail(strict=False) from all 4 stubs once implemented.
Leave the other 11 stubs with xfail decorators (they are promoted in Task 3).
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_cloud.py::test_credential_round_trip tests/test_cloud.py::test_ssrf_validation tests/test_cloud.py::test_ssrf_link_local tests/test_cloud.py::test_factory_returns_correct_backend -v 2>&1 | tail -10
- test_credential_round_trip, test_ssrf_validation, test_ssrf_link_local, test_factory_returns_correct_backend all PASSED
- test_ssrf_validation is parametrized (multiple params visible in output)
- No xfail decorators on these 4 tests
- Other 11 tests still xfail (not broken by this task)
- `pytest tests/test_cloud.py -v` exits 0
4 unit tests promoted to PASSED; cloud_utils.py coverage established; 11 integration stubs still xfailed
Task 3: Promote integration test stubs to real passing tests (HTTP endpoint coverage)
backend/tests/test_cloud.py
- backend/tests/test_cloud.py — current xfail stubs (11 remaining after Task 2)
- backend/tests/conftest.py — all fixtures including cloud_connection_factory, mock_google_drive_creds, async_client, db_session
- backend/api/cloud.py — endpoint paths and request/response shapes
- backend/api/admin.py — CloudConnectionOut fields
- backend/db/models.py — CloudConnection, User, Document fields
- .planning/phases/05-cloud-storage-backends/05-VALIDATION.md — test map with requirement → test correspondence
- 11 integration tests promoted; all use async_client, db_session, and/or monkeypatch (W3 split: integration tests only)
- test_credentials_enc_not_exposed: creates CloudConnection via cloud_connection_factory; calls GET /api/cloud/connections with valid auth; asserts "credentials_enc" not in response JSON at any level
- test_cloud_upload_no_presigned: creates CloudConnection; mocks cloud backend put_object; calls POST /api/documents/upload with target_backend="google_drive"; asserts no "upload_url" in response
- test_connection_status_display: creates ACTIVE CloudConnection; calls GET /api/cloud/connections; asserts response item has status == "ACTIVE"
- test_invalid_grant_sets_requires_reauth: creates CloudConnection with status="ACTIVE"; monkey-patches the cloud backend operation to raise CloudConnectionError(reason="invalid_grant"); calls GET /api/documents/{id}/content; asserts 503 response; then re-queries the CloudConnection from DB and asserts connection.status == "REQUIRES_REAUTH" — both HTTP response AND DB state verified (W2 fix)
- test_disconnect_deletes_credentials: creates CloudConnection; calls DELETE /api/cloud/connections/{id}; asserts 204; queries DB to confirm row deleted
- test_admin_cannot_see_credentials: creates admin user + CloudConnection; calls GET /api/cloud/connections with admin auth; asserts 403 response
- test_cross_user_idor: creates two users + CloudConnections; calls DELETE /api/cloud/connections/{user2_connection_id} with user1 auth; asserts 404
- test_connect_google_drive: calls GET /api/cloud/oauth/initiate/google_drive with valid auth; asserts 302 redirect containing "accounts.google.com" in location header
- test_oauth_callback_valid_state: pre-seeds Redis with oauth_state key; mocks google_auth_oauthlib.flow.Flow.fetch_token; calls GET /api/cloud/oauth/callback/google_drive?code=test&state={seed_state}; asserts 302 redirect to /settings?cloud_connected=google_drive
- test_oauth_callback_invalid_state: calls GET /api/cloud/oauth/callback/google_drive?code=x&state=invalid; asserts 400
- test_webdav_connect_validates: calls POST /api/cloud/connections/webdav with localhost URL; asserts 422 (SSRF blocked — validate_cloud_url raises ValueError before health check)
Promote the 11 remaining xfail stubs in test_cloud.py to real integration tests.
Keep: all 11 test function names, all @pytest.mark.asyncio decorators.
Remove: @pytest.mark.xfail(strict=False) from all 11 stubs.
Add: proper fixture parameters (async_client, db_session, monkeypatch).
Auth helper (add as module-level async def or import from conftest):
async def _create_user_and_token(session, role="user") — creates User row, generates JWT access token.
Mirror pattern from existing test_auth_api.py or test_documents.py.
For Redis tests (test_connect_google_drive, test_oauth_callback_valid_state, test_oauth_callback_invalid_state):
monkeypatch app.state.redis.setex, app.state.redis.get, app.state.redis.delete.
test_oauth_callback_valid_state: pre-seed via monkeypatch return values; mock Flow.fetch_token.
For test_invalid_grant_sets_requires_reauth (W2 requirement):
Create CloudConnection; monkeypatch get_storage_backend_for_document to raise
CloudConnectionError(reason="invalid_grant"); call GET /api/documents/{id}/content;
assert 503; then session.refresh(connection); assert connection.status == "REQUIRES_REAUTH".
Note: the DB write of REQUIRES_REAUTH must actually be committed by _call_cloud_op —
test verifies the real DB state, not just the HTTP response.
For SDK mocking: monkeypatch or patch the SDK calls at the module import level.
All tests must pass under SQLite in-memory (non-INTEGRATION mode).
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_cloud.py -v 2>&1
- `pytest tests/test_cloud.py -v` exits 0
- Output shows all 15 tests PASSED (no xfailed, no FAILED, no ERROR)
- test_invalid_grant_sets_requires_reauth: 503 HTTP response AND DB connection.status == "REQUIRES_REAUTH" (W2 + W3 combined)
- test_credentials_enc_not_exposed: "credentials_enc" not present anywhere in response JSON
- test_admin_cannot_see_credentials: 403 for admin role
- test_cross_user_idor: 404 for cross-user connection access
- `pytest -v --tb=short` (full suite) exits 0 with 0 failures
All 15 test stubs promoted to real passing tests; pytest tests/test_cloud.py exits 0 with all PASSED; full suite exits 0
## Trust Boundaries
| Boundary | Description |
|----------|-------------|
| UploadFile bytes → cloud backend | File bytes from browser pass through FastAPI to cloud provider — no direct browser-to-cloud |
| document.storage_backend → backend factory | storage_backend field from DB (not user input) determines which backend loads |
| CloudConnectionError → HTTP 503 | Provider rejection must surface as 503, not 500 (stack trace) or silent retry |
## STRIDE Threat Register
| Threat ID | Category | Component | Disposition | Mitigation Plan |
|-----------|----------|-----------|-------------|-----------------|
| T-05-06-01 | Spoofing | target_backend form field tampering | mitigate | target_backend validated against VALID_PROVIDERS set; invalid values return 422; CloudConnection load asserts user ownership before use |
| T-05-06-02 | Information Disclosure | CloudConnectionError message in 503 | mitigate | 503 detail = "Cloud connection requires re-authentication. Please reconnect in Settings." — no provider error detail or token info in response |
| T-05-06-03 | Denial of Service | Cloud upload quota bypass | accept | Cloud uploads do not consume MinIO quota (D-11: separate backends); cloud storage quotas are provider-side — not DocuVault's responsibility in v1 |
| T-05-06-04 | Tampering | Test mocks hiding real failures | mitigate | Tests mock at the boundary (SDK calls), not at the function level; behavior assertions check HTTP response codes and DB state, not implementation details |
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_cloud.py -v && python -m pytest -v --tb=short 2>&1 | tail -10
- POST /api/documents/upload: target_backend routing works for cloud backends; MinIO flow unchanged
- GET /api/documents/{id}/content: uses get_storage_backend_for_document; CloudConnectionError → 503
- test_cloud.py Task 2 (unit): test_credential_round_trip, test_ssrf_validation, test_ssrf_link_local, test_factory_returns_correct_backend all PASSED
- test_cloud.py Task 3 (integration): all 11 integration stubs PASSED including REQUIRES_REAUTH DB assertion
- test_cloud.py: all 15 tests PASSED; no xfailed
- pytest -v (full suite): exits 0, 0 failures