docs(05): create phase 5 plan — cloud storage backends (8 plans, 7 waves)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
curo1305
2026-05-28 19:43:12 +02:00
parent 141e582eab
commit baa5bed7e2
9 changed files with 2677 additions and 2 deletions
+22 -2
View File
@@ -188,7 +188,27 @@ Before any phase is marked complete, all three gates must pass:
4. A user can disconnect a cloud backend; credentials are permanently deleted from the DB and a subsequent attempt to use that backend returns an appropriate error — no orphaned data remains
5. An admin API response for a user's cloud connections returns only `provider, display_name, connected_at, status` — the `credentials_enc` column is never present in any serialized response
**Plans**: TBD
**Plans**: 8 plans
**Wave 1** — Test scaffold + dependencies
- [ ] 05-01-PLAN.md — Wave 0 xfail stubs, conftest cloud fixtures, requirements.txt packages, config.py settings
**Wave 2** — Shared utilities (parallel)
- [ ] 05-02-PLAN.md — cloud_utils.py (SSRF + HKDF), cloud_cache.py (TTLCache), storage factory extension
- [ ] 05-03-PLAN.md — GoogleDriveBackend + OneDriveBackend (all 7 StorageBackend methods)
- [ ] 05-04-PLAN.md — NextcloudBackend + WebDAVBackend (all 7 StorageBackend methods)
**Wave 4** — Cloud API
- [ ] 05-05-PLAN.md — All /api/cloud/* endpoints + /api/users/me/default-storage + main.py router registration
**Wave 5** — Document routing + full test suite
- [ ] 05-06-PLAN.md — Upload/content proxy cloud routing + all 15 tests promoted to passing
**Wave 6** — Frontend settings UI
- [ ] 05-07-PLAN.md — cloudConnections store + API client + SettingsView 3-tab + SettingsCloudTab + CloudCredentialModal
**Wave 7** — Frontend sidebar (human checkpoint)
- [ ] 05-08-PLAN.md — AppSidebar cloud section + CloudProviderTreeItem + CloudFolderTreeItem + human checkpoint
**Phase gates (must pass before Phase 5 is complete):**
- [ ] `pytest -v` — zero failures; SSRF prevention on WebDAV/Nextcloud user-supplied URLs; credential encryption/decryption round-trip; admin response never exposes `credentials_enc`; OAuth invalid_grant handling
@@ -207,4 +227,4 @@ Before any phase is marked complete, all three gates must pass:
| 2. Users & Authentication | 5/5 | Complete | 2026-05-22 |
| 3. Document Migration & Multi-User Isolation | 5/5 | Complete | 2026-05-25 |
| 4. Folders, Sharing, Quotas & Document UX | 9/9 | Complete | 2026-05-28 |
| 5. Cloud Storage Backends | 0/? | Not started | - |
| 5. Cloud Storage Backends | 0/8 | In Progress | - |
@@ -0,0 +1,287 @@
---
phase: 05-cloud-storage-backends
plan: 01
type: execute
wave: 1
depends_on: []
files_modified:
- backend/requirements.txt
- backend/config.py
- backend/tests/test_cloud.py
- backend/tests/conftest.py
autonomous: true
requirements:
- CLOUD-01
- CLOUD-02
- CLOUD-03
- CLOUD-04
- CLOUD-05
- CLOUD-06
- CLOUD-07
must_haves:
truths:
- "All 15 Phase 5 test stubs exist in test_cloud.py and xfail with strict=False"
- "conftest.py has mock_google_drive_creds, mock_onedrive_creds, mock_webdav_client, cloud_connection_factory fixtures"
- "requirements.txt includes all 6 new packages with correct version pins"
- "config.py has CLOUD_CREDS_KEY, GOOGLE_CLIENT_ID/SECRET, ONEDRIVE_CLIENT_ID/SECRET/TENANT_ID, BACKEND_URL settings"
- "pytest -v passes with zero failures after Wave 0 (stubs xfail, not fail)"
artifacts:
- path: "backend/tests/test_cloud.py"
provides: "All Phase 5 xfail test stubs"
contains: "test_credential_round_trip"
- path: "backend/tests/conftest.py"
provides: "cloud_connection_factory and mock fixtures"
contains: "cloud_connection_factory"
- path: "backend/requirements.txt"
provides: "New package dependencies"
contains: "cryptography"
- path: "backend/config.py"
provides: "Phase 5 settings"
contains: "cloud_creds_key"
key_links:
- from: "backend/tests/test_cloud.py"
to: "backend/tests/conftest.py"
via: "fixture injection"
pattern: "cloud_connection_factory"
---
<objective>
Wave 0 Nyquist scaffold: create all Phase 5 test stubs, conftest fixtures, new package dependencies, and config settings before any implementation begins.
Purpose: Establish the Nyquist validation scaffolding so every subsequent plan has a test to turn green. Per-phase gate — all stubs must xfail (strict=False), never fail.
Output: test_cloud.py with 15 xfail stubs, updated conftest.py with 4 new fixtures, requirements.txt with 6 new packages, config.py with Phase 5 env vars.
</objective>
<execution_context>
@/Users/nik/.claude/get-shit-done/workflows/execute-plan.md
@/Users/nik/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/05-cloud-storage-backends/05-CONTEXT.md
@.planning/phases/05-cloud-storage-backends/05-VALIDATION.md
@.planning/phases/05-cloud-storage-backends/05-RESEARCH.md
</context>
<interfaces>
<!-- From backend/tests/conftest.py — existing fixture patterns -->
From backend/tests/conftest.py:
- db_session: async SQLite in-memory session fixture (pytest_asyncio.fixture)
- async_client: AsyncClient with db_session override (pytest_asyncio.fixture)
- live_services_available: session-scoped, checks ports 5432/9000/6379
From backend/db/models.py:
- CloudConnection: id (UUID), user_id (UUID FK), provider (String), display_name (Text),
credentials_enc (Text), status (String, default="ACTIVE"), connected_at (TIMESTAMP)
- User: id (UUID), handle (String), email (String), role (String), is_active (Boolean),
default_storage_backend (String, default="minio")
From backend/api/admin.py:
- CloudConnectionOut: id (str), provider (str), display_name (str), status (str),
connected_at (datetime) — credentials_enc absent by omission (SEC-08)
</interfaces>
<tasks>
<task type="auto" tdd="true">
<name>Task 1: Add Phase 5 packages to requirements.txt and settings to config.py</name>
<files>backend/requirements.txt, backend/config.py</files>
<read_first>
- backend/requirements.txt — current packages and version format
- backend/config.py — SettingsConfigDict pattern, existing field declarations
</read_first>
<behavior>
- requirements.txt contains all 6 new packages with their exact version pins
- config.py Settings class has: cloud_creds_key (str, default "CHANGEME-32-bytes-padded!!"), google_client_id (str, default ""), google_client_secret (str, default ""), onedrive_client_id (str, default ""), onedrive_client_secret (str, default ""), onedrive_tenant_id (str, default "common"), backend_url (str, default "http://localhost:8000")
- All new settings have empty-string or safe defaults so the app boots without cloud credentials configured
</behavior>
<action>
Append to backend/requirements.txt (per D-02 research decisions, all versions VERIFIED on PyPI):
cryptography>=41.0.0, google-auth-oauthlib>=1.3.1, google-api-python-client>=2.196.0,
msal>=1.36.0, webdavclient3>=3.14.7, cachetools>=5.3.0.
In backend/config.py, inside the Settings class add a "# Cloud Storage (Phase 5)" comment block
followed by these fields:
- cloud_creds_key: str = "CHANGEME-32-bytes-padded!!" (CLOUD_CREDS_KEY env var — master key for HKDF)
- google_client_id: str = "" (GOOGLE_CLIENT_ID)
- google_client_secret: str = "" (GOOGLE_CLIENT_SECRET)
- onedrive_client_id: str = "" (ONEDRIVE_CLIENT_ID)
- onedrive_client_secret: str = "" (ONEDRIVE_CLIENT_SECRET)
- onedrive_tenant_id: str = "common" (ONEDRIVE_TENANT_ID — "common" works for personal + org accounts)
- backend_url: str = "http://localhost:8000" (BACKEND_URL — used to construct OAuth callback URLs)
.env.example should have the CLOUD_CREDS_KEY, GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET,
ONEDRIVE_CLIENT_ID, ONEDRIVE_CLIENT_SECRET, ONEDRIVE_TENANT_ID, BACKEND_URL entries
(create .env.example if it doesn't exist, or append if it does).
</action>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && grep -c "cryptography" requirements.txt && grep -c "google-auth-oauthlib" requirements.txt && grep -c "msal" requirements.txt && grep -c "webdavclient3" requirements.txt && grep -c "cachetools" requirements.txt && python -c "from config import settings; print(settings.cloud_creds_key)"</automated>
</verify>
<acceptance_criteria>
- backend/requirements.txt contains lines matching: cryptography>=41.0.0, google-auth-oauthlib>=1.3.1, google-api-python-client>=2.196.0, msal>=1.36.0, webdavclient3>=3.14.7, cachetools>=5.3.0
- backend/config.py contains `cloud_creds_key: str` and `google_client_id: str` and `backend_url: str`
- `python -c "from config import settings; print(settings.cloud_creds_key)"` prints without ImportError
</acceptance_criteria>
<done>requirements.txt has all 6 Phase 5 package lines; config.py imports and Settings loads without error; all 7 new cloud settings accessible via settings.{field_name}</done>
</task>
<task type="auto" tdd="true">
<name>Task 2: Create test_cloud.py with all 15 xfail stubs</name>
<files>backend/tests/test_cloud.py</files>
<read_first>
- backend/tests/test_folders.py — xfail stub pattern (pytest.mark.xfail(strict=False), single-line body)
- .planning/phases/05-cloud-storage-backends/05-VALIDATION.md — exact test names for all 15 stubs
- backend/db/models.py — CloudConnection model fields
</read_first>
<behavior>
- File exists at backend/tests/test_cloud.py
- Contains all 15 test stubs from the VALIDATION.md per-task verification map
- All stubs decorated with @pytest.mark.xfail(strict=False, reason="not implemented yet")
- Body of each stub is only: pytest.xfail("not implemented yet") — no assertions
- pytest tests/test_cloud.py -v exits 0 with all tests xfailed (not failed)
</behavior>
<action>
Create backend/tests/test_cloud.py with a module docstring and the following stubs, each
decorated with @pytest.mark.xfail(strict=False, reason="not implemented yet"):
From CLOUD-01:
- test_connect_google_drive
- test_oauth_callback_valid_state
- test_oauth_callback_invalid_state
- test_webdav_connect_validates
From CLOUD-02:
- test_credential_round_trip
- test_credentials_enc_not_exposed
From CLOUD-03:
- test_cloud_upload_no_presigned
From CLOUD-04:
- test_connection_status_display
From CLOUD-05:
- test_invalid_grant_sets_requires_reauth
From CLOUD-06:
- test_disconnect_deletes_credentials
From CLOUD-07:
- test_factory_returns_correct_backend
From D-17 SSRF:
- test_ssrf_validation (parametrized with @pytest.mark.parametrize over RFC-1918, loopback, link-local, valid URLs)
- test_ssrf_link_local
From SEC/IDOR:
- test_admin_cannot_see_credentials
- test_cross_user_idor
Import pytest at the top. Add `pytestmark = pytest.mark.asyncio` at module level.
Each test function takes no arguments for now (fixtures added in Task 3 when stubs
are promoted to real tests).
</action>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_cloud.py -v 2>&1 | tail -5</automated>
</verify>
<acceptance_criteria>
- backend/tests/test_cloud.py exists
- `pytest tests/test_cloud.py -v` exits with code 0 (xfailed is not failure)
- Output contains "xfailed" for all 15 stubs
- No test shows status "FAILED" or "ERROR"
- test_ssrf_validation is present (parametrize decorator may be skipped until implementation — single stub is fine)
</acceptance_criteria>
<done>pytest tests/test_cloud.py exits 0; all 15 stubs xfailed; no failures or collection errors</done>
</task>
<task type="auto" tdd="true">
<name>Task 3: Add cloud fixtures to conftest.py</name>
<files>backend/tests/conftest.py</files>
<read_first>
- backend/tests/conftest.py — existing fixture patterns (db_session, async_client, live_services_available)
- backend/db/models.py — CloudConnection fields (id, user_id, provider, display_name, credentials_enc, status, connected_at)
</read_first>
<behavior>
- conftest.py gains 4 new fixtures without breaking any existing fixtures
- mock_google_drive_creds returns a dict with keys: access_token, refresh_token, expiry, token_uri, client_id, client_secret
- mock_onedrive_creds returns a dict with keys: access_token, refresh_token, expires_at
- mock_webdav_client is a MagicMock with upload_to, download_from, list, check methods mocked (no real connection)
- cloud_connection_factory is a callable fixture factory that creates CloudConnection ORM rows in the db_session for arbitrary provider/status values
- pytest -v with the existing test suite exits 0 after adding these fixtures
</behavior>
<action>
Append to backend/tests/conftest.py (after the existing fixtures, do NOT modify any existing code):
1. mock_google_drive_creds fixture (scope="function"): returns a dict
{"access_token": "ya29.test_access", "refresh_token": "1//test_refresh",
"expiry": "2099-12-31T23:59:59", "token_uri": "https://oauth2.googleapis.com/token",
"client_id": "test_client_id", "client_secret": "test_client_secret"}
2. mock_onedrive_creds fixture (scope="function"): returns a dict
{"access_token": "test_ms_access", "refresh_token": "test_ms_refresh",
"expires_at": "2099-12-31T23:59:59"}
3. mock_webdav_client fixture (scope="function"): returns a MagicMock with
.upload_to, .download_from, .list, .check all set to MagicMock(return_value=None).
Import MagicMock from unittest.mock.
4. cloud_connection_factory fixture (scope="function"): a factory function that
accepts (session, user_id, provider="google_drive", status="ACTIVE",
display_name=None, credentials_enc="fake_encrypted_creds") and creates +
flushes a CloudConnection row. Returns the CloudConnection instance.
The factory accepts db_session as a pytest fixture dependency via the fixture
mechanism (fixture returning an inner async function that takes session as first arg).
Use pytest_asyncio.fixture decorator. The factory's inner function should be async.
</action>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest -v --co -q 2>&1 | grep "ERROR" | wc -l</automated>
</verify>
<acceptance_criteria>
- backend/tests/conftest.py contains `def cloud_connection_factory`
- backend/tests/conftest.py contains `def mock_google_drive_creds`
- backend/tests/conftest.py contains `def mock_onedrive_creds`
- backend/tests/conftest.py contains `def mock_webdav_client`
- `pytest -v --co -q` collection phase produces 0 ERROR lines
- `pytest tests/test_cloud.py -v` still exits 0 with all stubs xfailed
</acceptance_criteria>
<done>4 new fixtures in conftest.py; collection error count = 0; existing test suite unaffected</done>
</task>
</tasks>
<threat_model>
## Trust Boundaries
| Boundary | Description |
|----------|-------------|
| test code → production code | Tests import production modules; config loading must not fail when cloud creds are absent |
| requirements.txt → PyPI | Package names and versions must match PyPI exactly; wrong names install typosquats |
## STRIDE Threat Register
| Threat ID | Category | Component | Disposition | Mitigation Plan |
|-----------|----------|-----------|-------------|-----------------|
| T-05-01-01 | Tampering | requirements.txt package names | mitigate | All 6 packages verified via slopcheck [OK] in RESEARCH.md Package Legitimacy Audit — no [SLOP] or [SUS] verdict |
| T-05-01-02 | Information Disclosure | config.py cloud_creds_key default | mitigate | Default is "CHANGEME-32-bytes-padded!!" — clearly a placeholder; production requires env var override; Settings raises no error but the HKDF in cloud_utils will produce useless keys without real master key |
| T-05-01-SC | Tampering | npm/pip/cargo installs | mitigate | All 6 new packages are [OK] per RESEARCH.md slopcheck audit; no blocking human checkpoint required |
</threat_model>
<verification>
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_cloud.py -v && python -m pytest -v --tb=short 2>&1 | tail -10
</verification>
<success_criteria>
- pytest tests/test_cloud.py exits 0; all 15 stubs show xfailed
- pytest -v (full suite) exits 0 with zero failures
- requirements.txt contains all 6 new package lines
- config.py Settings loads without error; cloud_creds_key, google_client_id, backend_url all accessible
- conftest.py has 4 new fixtures: mock_google_drive_creds, mock_onedrive_creds, mock_webdav_client, cloud_connection_factory
</success_criteria>
<output>
Create `.planning/phases/05-cloud-storage-backends/05-01-SUMMARY.md` when done
</output>
@@ -0,0 +1,279 @@
---
phase: 05-cloud-storage-backends
plan: 02
type: execute
wave: 2
depends_on:
- "05-01"
files_modified:
- backend/storage/cloud_utils.py
- backend/services/cloud_cache.py
- backend/storage/__init__.py
autonomous: true
requirements:
- CLOUD-02
- CLOUD-07
must_haves:
truths:
- "validate_cloud_url() blocks all RFC-1918, loopback, and link-local addresses"
- "encrypt_credentials / decrypt_credentials produce a correct round-trip for any dict"
- "get_storage_backend_for_document() factory returns the correct backend type from document.storage_backend"
- "TTLCache singleton is module-level in cloud_cache.py with maxsize=1000, ttl=60"
artifacts:
- path: "backend/storage/cloud_utils.py"
provides: "SSRF validation + HKDF credential encryption"
contains: "def validate_cloud_url"
- path: "backend/services/cloud_cache.py"
provides: "TTLCache singleton for cloud folder listings"
contains: "get_cloud_folders_cached"
- path: "backend/storage/__init__.py"
provides: "Extended factory for cloud backends"
contains: "get_storage_backend_for_document"
key_links:
- from: "backend/storage/cloud_utils.py"
to: "backend/config.py"
via: "settings.cloud_creds_key"
pattern: "cloud_creds_key"
- from: "backend/storage/__init__.py"
to: "backend/storage/cloud_utils.py"
via: "decrypt_credentials import"
pattern: "decrypt_credentials"
---
<objective>
Create the shared utilities layer for Phase 5: SSRF-safe URL validation, HKDF+Fernet credential encryption/decryption, TTLCache for folder listings, and the extended storage backend factory.
Purpose: All cloud backends and API handlers depend on these primitives. Establishing them before the backends prevents duplication and ensures security invariants are enforced in one place.
Output: cloud_utils.py (validate_cloud_url, encrypt_credentials, decrypt_credentials), cloud_cache.py (TTLCache singleton), updated storage/__init__.py (get_storage_backend_for_document factory).
</objective>
<execution_context>
@/Users/nik/.claude/get-shit-done/workflows/execute-plan.md
@/Users/nik/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/phases/05-cloud-storage-backends/05-CONTEXT.md
@.planning/phases/05-cloud-storage-backends/05-RESEARCH.md
@.planning/phases/05-cloud-storage-backends/05-01-SUMMARY.md
</context>
<interfaces>
<!-- From backend/storage/__init__.py — current factory -->
From backend/storage/__init__.py:
def get_storage_backend() -> StorageBackend:
"""Returns MinIOBackend configured from settings."""
From backend/db/models.py:
Document: storage_backend (String, nullable=False, default="minio"), user_id (UUID nullable)
CloudConnection: id (UUID), user_id (UUID FK), provider (String), credentials_enc (Text),
status (String), connected_at (TIMESTAMP)
User: id (UUID), default_storage_backend (String, default="minio")
From backend/config.py (after Plan 01):
settings.cloud_creds_key: str
settings.minio_endpoint, minio_access_key, minio_secret_key, minio_bucket, minio_public_endpoint
From backend/storage/minio_backend.py:
class MinIOBackend(StorageBackend): -- reference asyncio.to_thread() pattern
RESEARCH.md Pattern 6: SSRF validation using ipaddress + socket.getaddrinfo.
RESEARCH.md Pattern 2: HKDF+Fernet — fresh HKDF instance per call (AlreadyFinalized pitfall).
RESEARCH.md Pattern 8: TTLCache thread safety — threading.Lock required for concurrent access.
RESEARCH.md Pattern 9: get_storage_backend_for_document factory extension.
</interfaces>
<tasks>
<task type="auto" tdd="true">
<name>Task 1: Create cloud_utils.py — SSRF validation + HKDF credential encryption</name>
<files>backend/storage/cloud_utils.py</files>
<read_first>
- backend/storage/base.py — StorageBackend ABC, 7 method signatures
- backend/config.py — settings.cloud_creds_key field name
- .planning/phases/05-cloud-storage-backends/05-RESEARCH.md — Pattern 2 (HKDF+Fernet) and Pattern 6 (SSRF)
</read_first>
<behavior>
- validate_cloud_url(url: str) -> None raises ValueError for: localhost, 127.0.0.0/8, 169.254.0.0/16, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, ::1/128, fc00::/7
- validate_cloud_url resolves DNS via socket.getaddrinfo before checking IP (anti-DNS-rebinding: resolves hostname to IP then checks IP against blocked networks)
- validate_cloud_url raises ValueError for non-http/https schemes
- validate_cloud_url raises ValueError for URLs with no hostname
- _derive_fernet_key(master_key: bytes, user_id: str) -> Fernet: creates a fresh HKDF instance on every call (never reuses); uses algorithm=hashes.SHA256(), length=32, salt=user_id.encode("utf-8"), info=b"cloud-credentials"
- encrypt_credentials(master_key: bytes, user_id: str, credentials: dict) -> str: returns Fernet-encrypted JSON string (not plaintext)
- decrypt_credentials(master_key: bytes, user_id: str, credentials_enc: str) -> dict: returns original dict
- Round-trip: decrypt_credentials(master_key, uid, encrypt_credentials(master_key, uid, creds)) == creds
</behavior>
<action>
Create backend/storage/cloud_utils.py with module docstring explaining SSRF prevention and HKDF pattern.
Implement validate_cloud_url(url: str) -> None:
- Import: ipaddress, socket, urllib.parse.urlparse
- Parse URL; reject non-http/https schemes; reject missing hostname
- Define BLOCKED_NETS list: ip_network("127.0.0.0/8"), ip_network("169.254.0.0/16"),
ip_network("10.0.0.0/8"), ip_network("172.16.0.0/12"), ip_network("192.168.0.0/16"),
ip_network("::1/128"), ip_network("fc00::/7")
- Also explicitly block hostname == "localhost" string before IP resolution
- Try ipaddress.ip_address(hostname) — if that fails (not a raw IP), use
socket.getaddrinfo(hostname, None)[0][4][0] to resolve; wrap socket.gaierror
- Check resolved IP against each BLOCKED_NETS entry using addr in net
Implement _derive_fernet_key(master_key: bytes, user_id: str) -> Fernet:
- Import: base64, cryptography.hazmat.primitives.hashes, cryptography.hazmat.primitives.kdf.hkdf.HKDF, cryptography.fernet.Fernet
- Create new HKDF(...) instance each call — do NOT cache or store the instance
- Call hkdf.derive(master_key) → 32 raw bytes
- Return Fernet(base64.urlsafe_b64encode(raw_key))
Implement encrypt_credentials(master_key: bytes, user_id: str, credentials: dict) -> str:
- import json inside function body (or at top)
- Call _derive_fernet_key to get a Fernet instance
- Return f.encrypt(json.dumps(credentials).encode("utf-8")).decode("utf-8")
Implement decrypt_credentials(master_key: bytes, user_id: str, credentials_enc: str) -> dict:
- Call _derive_fernet_key to get a Fernet instance
- Return json.loads(f.decrypt(credentials_enc.encode("utf-8")))
</action>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c "
from storage.cloud_utils import validate_cloud_url, encrypt_credentials, decrypt_credentials
import pytest
# SSRF check
try:
validate_cloud_url('http://127.0.0.1/dav')
print('FAIL: loopback should raise')
except ValueError:
print('OK: loopback blocked')
try:
validate_cloud_url('http://10.0.0.1/dav')
print('FAIL: RFC-1918 should raise')
except ValueError:
print('OK: RFC-1918 blocked')
# Round-trip
mk = b'test-master-key-32bytes-padded!!'
uid = '550e8400-e29b-41d4-a716-446655440000'
creds = {'access_token': 'ya29.xxx', 'refresh_token': '1//xxx'}
enc = encrypt_credentials(mk, uid, creds)
assert enc != str(creds)
dec = decrypt_credentials(mk, uid, enc)
assert dec == creds, f'Round-trip failed: {dec}'
print('OK: encryption round-trip')
"</automated>
</verify>
<acceptance_criteria>
- backend/storage/cloud_utils.py contains def validate_cloud_url, def encrypt_credentials, def decrypt_credentials, def _derive_fernet_key
- validate_cloud_url("http://127.0.0.1/dav") raises ValueError
- validate_cloud_url("http://10.0.0.1/dav") raises ValueError
- validate_cloud_url("http://169.254.169.254/dav") raises ValueError
- validate_cloud_url("http://192.168.1.1/dav") raises ValueError
- validate_cloud_url("http://localhost/dav") raises ValueError
- Encryption round-trip: decrypt_credentials(key, uid, encrypt_credentials(key, uid, creds)) == creds
- "access_token" plaintext does NOT appear in the encrypted string
</acceptance_criteria>
<done>cloud_utils.py created; SSRF validation blocks all 5 network categories; HKDF round-trip verified via python -c invocation</done>
</task>
<task type="auto" tdd="true">
<name>Task 2: Create cloud_cache.py and extend storage factory</name>
<files>backend/services/cloud_cache.py, backend/storage/__init__.py</files>
<read_first>
- backend/storage/__init__.py — current get_storage_backend() factory
- backend/storage/base.py — StorageBackend ABC
- backend/storage/minio_backend.py — MinIOBackend constructor signature
- backend/db/models.py — CloudConnection, Document, User model fields
- .planning/phases/05-cloud-storage-backends/05-RESEARCH.md — Pattern 8 (TTLCache), Pattern 9 (factory extension)
</read_first>
<behavior>
- backend/services/cloud_cache.py exports a module-level _folder_cache = TTLCache(maxsize=1000, ttl=60) and a threading.Lock()
- get_cloud_folders_cached(user_id: str, provider: str, folder_id: str, fetch_fn: Awaitable) is an async function that checks cache before calling fetch_fn
- get_storage_backend_for_document(document, user, session) is an async function added to backend/storage/__init__.py that returns MinIOBackend for storage_backend=="minio" and raises HTTPException(503) for unknown or inactive cloud connections
- existing get_storage_backend() function in __init__.py is NOT modified (existing callers unaffected)
- get_storage_backend_for_document raises HTTPException(503, detail="Cloud connection not found or inactive") when CloudConnection is missing or status != "ACTIVE"
</behavior>
<action>
Create backend/services/cloud_cache.py:
- Import: threading, cachetools.TTLCache, typing.Callable, typing.Awaitable
- Module-level: _folder_cache: TTLCache = TTLCache(maxsize=1000, ttl=60)
- Module-level: _folder_cache_lock = threading.Lock()
- async function get_cloud_folders_cached(user_id: str, provider: str, folder_id: str, fetch_fn) -> list:
cache_key = f"{user_id}:{provider}:{folder_id}"
with _folder_cache_lock: check if cache_key in _folder_cache; return cached if found
result = await fetch_fn() # called OUTSIDE the lock to not block event loop
with _folder_cache_lock: store result in cache
return result
- Function invalidate_provider_cache(user_id: str, provider: str) -> None: iterates
_folder_cache with lock and deletes all keys starting with f"{user_id}:{provider}:"
Extend backend/storage/__init__.py (add after existing get_storage_backend()):
- Import at top of file: select from sqlalchemy, HTTPException from fastapi, AsyncSession from sqlalchemy.ext.asyncio, Optional from typing
- Import: from db.models import CloudConnection, Document, User
- Import: from config import settings
- Import: from storage.cloud_utils import decrypt_credentials
- Add async function get_storage_backend_for_document(document, user, session: AsyncSession) -> StorageBackend:
If document.storage_backend == "minio": return get_storage_backend() (existing factory)
Otherwise: query CloudConnection where user_id=user.id AND provider=document.storage_backend AND status="ACTIVE"
If not found: raise HTTPException(status_code=503, detail="Cloud connection not found or inactive")
Decrypt credentials: master_key = settings.cloud_creds_key.encode(); credentials = decrypt_credentials(master_key, str(user.id), conn.credentials_enc)
If provider == "google_drive": import GoogleDriveBackend; return GoogleDriveBackend(credentials)
Elif provider == "onedrive": import OneDriveBackend; return OneDriveBackend(credentials)
Elif provider in ("nextcloud", "webdav"): import WebDAVBackend; return WebDAVBackend(credentials["server_url"], credentials["username"], credentials["password"])
Else: raise ValueError(f"Unknown storage backend: {document.storage_backend}")
Use lazy imports (inside the function) for cloud backends to avoid circular imports at module load time.
</action>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c "
from services.cloud_cache import get_cloud_folders_cached, _folder_cache, _folder_cache_lock, invalidate_provider_cache
from storage import get_storage_backend, get_storage_backend_for_document
print('cloud_cache imports OK')
print('factory extension imports OK')
print(f'TTLCache maxsize={_folder_cache.maxsize}, ttl={_folder_cache.ttl}')
"</automated>
</verify>
<acceptance_criteria>
- backend/services/cloud_cache.py exists and exports _folder_cache (TTLCache), _folder_cache_lock (Lock), get_cloud_folders_cached (async), invalidate_provider_cache
- _folder_cache.maxsize == 1000 and _folder_cache.ttl == 60
- backend/storage/__init__.py exports get_storage_backend_for_document (async function)
- `from storage import get_storage_backend_for_document` imports without error
- Existing `from storage import get_storage_backend` still works (no regression)
- `python -m pytest -v --tb=short` passes with 0 failures (no import regressions)
</acceptance_criteria>
<done>cloud_cache.py created with TTLCache singleton and cache/invalidate helpers; storage/__init__.py has get_storage_backend_for_document; full pytest suite passes</done>
</task>
</tasks>
<threat_model>
## Trust Boundaries
| Boundary | Description |
|----------|-------------|
| user-supplied URL → validate_cloud_url | Untrusted URL must be checked against SSRF blocklist before any HTTP call |
| credentials dict → Fernet ciphertext | Credentials must never appear in plaintext after this layer |
| DNS resolution → IP check | DNS-based SSRF bypass: hostname resolves to internal IP after validation |
## STRIDE Threat Register
| Threat ID | Category | Component | Disposition | Mitigation Plan |
|-----------|----------|-----------|-------------|-----------------|
| T-05-02-01 | Tampering | validate_cloud_url — DNS resolution | mitigate | socket.getaddrinfo resolves hostname to IP before network check; validate_cloud_url called immediately before each request (not only at connect-time) per D-17; DNS rebinding window is minimized |
| T-05-02-02 | Information Disclosure | _derive_fernet_key — HKDF instance reuse | mitigate | New HKDF(...) instance created on every _derive_fernet_key call; AlreadyFinalized pitfall (RESEARCH.md Pitfall 3) prevented by construction |
| T-05-02-03 | Information Disclosure | cloud_creds_key default value | mitigate | Default "CHANGEME-32-bytes-padded!!" is clearly a placeholder; production deployment requires CLOUD_CREDS_KEY env var; docstring on Settings field documents the requirement |
| T-05-02-04 | Elevation of Privilege | get_storage_backend_for_document — cross-user | mitigate | Function receives user object from get_regular_user dep; CloudConnection query includes user_id=user.id filter; cross-user access impossible via this function |
| T-05-02-SC | Tampering | cachetools package install | mitigate | cachetools verified [OK] in RESEARCH.md slopcheck audit |
</threat_model>
<verification>
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_cloud.py -v && python -m pytest -v --tb=short 2>&1 | tail -10
</verification>
<success_criteria>
- cloud_utils.py: validate_cloud_url blocks RFC-1918/loopback/link-local; HKDF round-trip correct
- cloud_cache.py: TTLCache(maxsize=1000, ttl=60) with thread-safe lock; get_cloud_folders_cached works
- storage/__init__.py: get_storage_backend_for_document added alongside existing get_storage_backend()
- pytest -v exits 0, 0 failures; test_cloud.py still all xfailed
</success_criteria>
<output>
Create `.planning/phases/05-cloud-storage-backends/05-02-SUMMARY.md` when done
</output>
@@ -0,0 +1,400 @@
---
phase: 05-cloud-storage-backends
plan: 03
type: execute
wave: 3
depends_on:
- "05-02"
files_modified:
- backend/storage/google_drive_backend.py
- backend/storage/onedrive_backend.py
autonomous: true
requirements:
- CLOUD-01
- CLOUD-05
- CLOUD-07
must_haves:
truths:
- "GoogleDriveBackend implements all 7 StorageBackend abstract methods"
- "OneDriveBackend implements all 7 StorageBackend abstract methods"
- "generate_presigned_put_url and presigned_get_url raise NotImplementedError on both cloud backends (D-14)"
- "All sync SDK calls wrapped in asyncio.to_thread() — event loop never blocked"
- "On-demand token refresh: 401/token-expiry error triggers transparent refresh; invalid_grant sets REQUIRES_REAUTH"
- "Google OAuth Flow uses access_type='offline', prompt='consent' (Pitfall 1 prevention)"
- "OneDrive uses resumable upload sessions (createUploadSession) for all files (Pitfall 6 prevention)"
artifacts:
- path: "backend/storage/google_drive_backend.py"
provides: "Google Drive v3 StorageBackend implementation"
contains: "class GoogleDriveBackend"
- path: "backend/storage/onedrive_backend.py"
provides: "Microsoft Graph / OneDrive StorageBackend implementation"
contains: "class OneDriveBackend"
key_links:
- from: "backend/storage/google_drive_backend.py"
to: "backend/storage/cloud_utils.py"
via: "decrypt_credentials used by factory caller"
pattern: "GoogleDriveBackend.__init__"
- from: "backend/storage/onedrive_backend.py"
to: "backend/storage/cloud_utils.py"
via: "decrypt_credentials used by factory caller"
pattern: "OneDriveBackend.__init__"
---
<objective>
Implement GoogleDriveBackend and OneDriveBackend — the two OAuth-based cloud StorageBackend concrete classes.
Purpose: These backends handle Google Drive v3 and Microsoft Graph file operations. Both use async-wrapped sync SDKs, on-demand token refresh, and handle the invalid_grant → REQUIRES_REAUTH transition per D-05/D-06.
Output: google_drive_backend.py and onedrive_backend.py, each implementing all 7 StorageBackend methods.
</objective>
<execution_context>
@/Users/nik/.claude/get-shit-done/workflows/execute-plan.md
@/Users/nik/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/phases/05-cloud-storage-backends/05-CONTEXT.md
@.planning/phases/05-cloud-storage-backends/05-RESEARCH.md
@.planning/phases/05-cloud-storage-backends/05-02-SUMMARY.md
</context>
<interfaces>
<!-- From backend/storage/base.py — StorageBackend ABC (all 7 methods) -->
From backend/storage/base.py:
class StorageBackend(ABC):
async def put_object(self, user_id: str, document_id: str, file_bytes: bytes, extension: str, content_type: str) -> str: ...
async def get_object(self, object_key: str) -> bytes: ...
async def delete_object(self, object_key: str) -> None: ...
async def presigned_get_url(self, object_key: str, expires_minutes: int = 60) -> str: ...
async def health_check(self) -> bool: ...
async def generate_presigned_put_url(self, object_key: str, expires_minutes: int = 15) -> str: ...
async def stat_object(self, object_key: str) -> int: ...
<!-- From RESEARCH.md Pattern 3 — Google Drive OAuth Flow -->
Google Drive credential dict keys: access_token, refresh_token, expiry (ISO string), token_uri, client_id, client_secret
google_auth_oauthlib: Flow.from_client_config, flow.authorization_url(access_type="offline", prompt="consent")
google-api-python-client: googleapiclient.discovery.build("drive", "v3", credentials=creds)
service.files().create(body={...}, media_body=MediaIoBaseUpload(buf, mimetype=content_type)).execute()
service.files().get(fileId=key, fields="id,name,size").execute()
service.files().delete(fileId=key).execute()
GoogleDrive object_key = file_id returned by files().create()
<!-- From RESEARCH.md Pattern 4 — OneDrive MSAL Flow -->
OneDrive credential dict keys: access_token, refresh_token, expires_at (ISO string)
msal.ConfidentialClientApplication(client_id, client_credential=client_secret, authority=f"https://login.microsoftonline.com/{tenant_id}")
app.acquire_token_by_refresh_token(refresh_token, scopes=["Files.ReadWrite", "offline_access"])
Microsoft Graph: POST /me/drive/root:/{path}:/createUploadSession, then PUT chunks to uploadUrl
Microsoft Graph: GET /me/drive/items/{item_id}/content — streams bytes
Microsoft Graph: DELETE /me/drive/items/{item_id}
OneDrive object_key = item_id from upload response
<!-- From RESEARCH.md Pattern 10 — On-demand token refresh -->
Custom exception: CloudConnectionError (raised when invalid_grant detected)
On 401 / token-expiry: refresh token, update credentials_enc in conn, retry once
On invalid_grant: set conn.status = "REQUIRES_REAUTH", raise CloudConnectionError
Both backends need session + conn parameters for the refresh/update path (passed by the API layer caller)
</interfaces>
<tasks>
<task type="auto" tdd="true">
<name>Task 1: Implement GoogleDriveBackend</name>
<files>backend/storage/google_drive_backend.py</files>
<read_first>
- backend/storage/base.py — exact signatures for all 7 abstract methods
- backend/storage/minio_backend.py — asyncio.to_thread() wrapping pattern, __init__ style
- .planning/phases/05-cloud-storage-backends/05-RESEARCH.md — Pattern 3, Pattern 7 (on-demand refresh), Pitfall 1, Pitfall 7
- backend/storage/cloud_utils.py — encrypt_credentials, decrypt_credentials signatures (for refresh path)
</read_first>
<behavior>
- GoogleDriveBackend.__init__(self, credentials: dict) stores credentials dict; builds google.oauth2.credentials.Credentials from it
- put_object: creates Drive file via service.files().create() wrapped in asyncio.to_thread(); returns Google Drive file_id as object_key
- get_object: downloads file bytes via service.files().get_media(fileId=key) wrapped in asyncio.to_thread(); returns bytes
- delete_object: calls service.files().delete(fileId=key) wrapped in asyncio.to_thread(); no-op if file not found (catch HttpError 404)
- presigned_get_url: raises NotImplementedError("Google Drive backend does not support presigned URLs")
- generate_presigned_put_url: raises NotImplementedError("Google Drive backend does not support presigned put URLs")
- stat_object: calls service.files().get(fileId=key, fields="size") wrapped in asyncio.to_thread(); returns int(metadata.get("size", 0))
- health_check: tries files().list(pageSize=1) wrapped in asyncio.to_thread(); returns True/False
- All sync googleapiclient calls wrapped in asyncio.to_thread() (Pitfall 7)
- On-demand token refresh: _is_token_expired(e) detects googleapiclient.errors.HttpError status 401; _refresh_google_creds(credentials) calls google.auth.transport.requests.Request() to refresh; returns updated credentials dict or None on invalid_grant
- CloudConnectionError exception class defined in this module for invalid_grant signaling
</behavior>
<action>
Create backend/storage/google_drive_backend.py with:
Module docstring explaining Google Drive v3 backend, asyncio.to_thread() requirement, and D-14 NotImplementedError rationale.
from __future__ import annotations
import asyncio, io, uuid
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from googleapiclient.http import MediaIoBaseUpload, MediaIoBaseDownload
from google.oauth2.credentials import Credentials
from google.auth.transport.requests import Request
from storage.base import StorageBackend
class CloudConnectionError(Exception): pass
class GoogleDriveBackend(StorageBackend):
SCOPES = ["https://www.googleapis.com/auth/drive.file"]
def __init__(self, credentials: dict) -> None:
self._creds_dict = credentials
self._creds = self._dict_to_google_creds(credentials)
def _dict_to_google_creds(self, d: dict) -> Credentials:
# Build google.oauth2.credentials.Credentials from stored dict
# d keys: access_token, refresh_token, expiry (ISO str), token_uri, client_id, client_secret
import datetime
creds = Credentials(
token=d["access_token"],
refresh_token=d.get("refresh_token"),
token_uri=d.get("token_uri", "https://oauth2.googleapis.com/token"),
client_id=d.get("client_id"),
client_secret=d.get("client_secret"),
)
if d.get("expiry"):
creds.expiry = datetime.datetime.fromisoformat(d["expiry"])
return creds
def _get_service(self):
return build("drive", "v3", credentials=self._creds, cache_discovery=False)
async def put_object(self, user_id, document_id, file_bytes, extension, content_type) -> str:
# Wrap the sync file create in asyncio.to_thread
# file_metadata: name = f"{document_id}{extension}" (provider-side name)
# Returns Drive file_id as object_key (not a path — D-02: cloud object_key = provider native ID)
async def get_object(self, object_key: str) -> bytes:
# Use MediaIoBaseDownload to stream bytes into BytesIO, return bytes
async def delete_object(self, object_key: str) -> None:
# Catch HttpError 404 silently; re-raise other errors
async def presigned_get_url(self, object_key: str, expires_minutes: int = 60) -> str:
raise NotImplementedError("Google Drive backend does not support presigned URLs — use get_object() for streaming")
async def generate_presigned_put_url(self, object_key: str, expires_minutes: int = 15) -> str:
raise NotImplementedError("Google Drive backend does not support presigned put URLs — use put_object() for direct upload")
async def stat_object(self, object_key: str) -> int:
# service.files().get(fileId=object_key, fields="size").execute()
# Return int(metadata.get("size", 0))
async def health_check(self) -> bool:
# Try files().list(pageSize=1); return True/False
All concrete method bodies must be fully implemented (not just stubs).
Each sync call must be wrapped in asyncio.to_thread(lambda: ...) or asyncio.to_thread(fn, arg).
</action>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c "
from storage.google_drive_backend import GoogleDriveBackend, CloudConnectionError
import inspect, asyncio
# Verify all 7 methods are coroutines
for method in ['put_object','get_object','delete_object','presigned_get_url','health_check','generate_presigned_put_url','stat_object']:
assert inspect.iscoroutinefunction(getattr(GoogleDriveBackend, method)), f'{method} not async'
# Verify NotImplementedError for presigned methods
backend = GoogleDriveBackend({'access_token':'x','refresh_token':'y','token_uri':'https://oauth2.googleapis.com/token','client_id':'c','client_secret':'s'})
async def check():
try:
await backend.presigned_get_url('key')
print('FAIL: should raise NotImplementedError')
except NotImplementedError:
print('OK: presigned_get_url raises NotImplementedError')
try:
await backend.generate_presigned_put_url('key')
print('FAIL: should raise NotImplementedError')
except NotImplementedError:
print('OK: generate_presigned_put_url raises NotImplementedError')
asyncio.run(check())
print('All 7 methods are coroutines: OK')
"</automated>
</verify>
<acceptance_criteria>
- backend/storage/google_drive_backend.py exists with class GoogleDriveBackend
- All 7 methods are async (inspect.iscoroutinefunction returns True)
- presigned_get_url and generate_presigned_put_url raise NotImplementedError
- CloudConnectionError class defined and importable from this module
- Import succeeds: `from storage.google_drive_backend import GoogleDriveBackend, CloudConnectionError`
- `pytest -v --tb=short` exits 0 (no import regressions)
</acceptance_criteria>
<done>GoogleDriveBackend created with all 7 methods; NotImplementedError on presigned methods; CloudConnectionError defined; pytest passes</done>
</task>
<task type="auto" tdd="true">
<name>Task 2: Implement OneDriveBackend</name>
<files>backend/storage/onedrive_backend.py</files>
<read_first>
- backend/storage/base.py — all 7 method signatures
- backend/storage/google_drive_backend.py — pattern reference (asyncio.to_thread, CloudConnectionError)
- .planning/phases/05-cloud-storage-backends/05-RESEARCH.md — Pattern 4 (MSAL), Pitfall 6 (resumable upload), Assumption A3 (invalid_grant in result["error"])
- backend/config.py — settings.onedrive_client_id, onedrive_client_secret, onedrive_tenant_id
</read_first>
<behavior>
- OneDriveBackend.__init__(self, credentials: dict) stores credentials dict (access_token, refresh_token, expires_at)
- put_object: uses Microsoft Graph createUploadSession + chunked PUT (10 MB chunks) for ALL files (Pitfall 6 — no 4 MB limit); returns OneDrive item_id as object_key
- get_object: GET https://graph.microsoft.com/v1.0/me/drive/items/{item_id}/content via httpx.get with Authorization bearer; returns bytes
- delete_object: DELETE https://graph.microsoft.com/v1.0/me/drive/items/{item_id}; catch 404 silently
- presigned_get_url: raises NotImplementedError
- generate_presigned_put_url: raises NotImplementedError
- stat_object: GET /me/drive/items/{item_id}?$select=size; return int(response["size"])
- health_check: GET /me/drive?$select=id; return True/False
- _refresh_token(credentials: dict) -> dict | None: calls msal.ConfidentialClientApplication.acquire_token_by_refresh_token(); returns new credentials dict or None if result.get("error") == "invalid_grant"
- All sync msal calls wrapped in asyncio.to_thread(); httpx calls are already async (use await httpx.AsyncClient)
- CHUNK_SIZE = 10 * 1024 * 1024 (10 MB, above Graph's 4 MB limit)
</behavior>
<action>
Create backend/storage/onedrive_backend.py with:
Module docstring explaining OneDrive/Microsoft Graph backend, resumable upload requirement (Pitfall 6), and asyncio.to_thread pattern.
from __future__ import annotations
import asyncio, io, uuid, datetime
import httpx
import msal
from config import settings
from storage.base import StorageBackend
from storage.google_drive_backend import CloudConnectionError # reuse same exception
GRAPH_BASE = "https://graph.microsoft.com/v1.0"
CHUNK_SIZE = 10 * 1024 * 1024 # 10 MB — above Graph's 4 MB simple upload limit
class OneDriveBackend(StorageBackend):
def __init__(self, credentials: dict) -> None:
self._credentials = credentials # {"access_token": ..., "refresh_token": ..., "expires_at": ...}
def _auth_headers(self) -> dict:
return {"Authorization": f"Bearer {self._credentials['access_token']}"}
async def _ensure_valid_token(self) -> None:
# Check if access_token is expired (expires_at < now + 60s buffer)
# If expired, call _refresh_token(); update self._credentials
# If refresh returns None → raise CloudConnectionError("OneDrive connection requires re-authentication")
async def _refresh_token(self) -> dict | None:
# Wrap msal call in asyncio.to_thread
# Create ConfidentialClientApplication with settings.onedrive_client_id, onedrive_client_secret, authority
# Call acquire_token_by_refresh_token(self._credentials["refresh_token"], scopes=["Files.ReadWrite","offline_access"])
# Return updated dict or None if result.get("error") == "invalid_grant"
async def put_object(self, user_id, document_id, file_bytes, extension, content_type) -> str:
# 1. Ensure valid token
# 2. POST {GRAPH_BASE}/me/drive/root:/{user_id}/{document_id}{extension}:/createUploadSession
# 3. PUT file_bytes to uploadUrl in CHUNK_SIZE chunks
# 4. Return item_id from final upload response
async def get_object(self, object_key: str) -> bytes:
await self._ensure_valid_token()
async with httpx.AsyncClient() as client:
r = await client.get(f"{GRAPH_BASE}/me/drive/items/{object_key}/content",
headers=self._auth_headers(), follow_redirects=True)
r.raise_for_status()
return r.content
async def delete_object(self, object_key: str) -> None:
await self._ensure_valid_token()
async with httpx.AsyncClient() as client:
r = await client.delete(f"{GRAPH_BASE}/me/drive/items/{object_key}",
headers=self._auth_headers())
if r.status_code not in (204, 404):
r.raise_for_status()
async def presigned_get_url(self, object_key: str, expires_minutes: int = 60) -> str:
raise NotImplementedError("OneDrive backend does not support presigned URLs — use get_object() for streaming")
async def generate_presigned_put_url(self, object_key: str, expires_minutes: int = 15) -> str:
raise NotImplementedError("OneDrive backend does not support presigned put URLs — use put_object() for direct upload")
async def stat_object(self, object_key: str) -> int:
await self._ensure_valid_token()
async with httpx.AsyncClient() as client:
r = await client.get(f"{GRAPH_BASE}/me/drive/items/{object_key}",
params={"$select": "size"}, headers=self._auth_headers())
r.raise_for_status()
return int(r.json().get("size", 0))
async def health_check(self) -> bool:
try:
await self._ensure_valid_token()
async with httpx.AsyncClient() as client:
r = await client.get(f"{GRAPH_BASE}/me/drive", params={"$select": "id"},
headers=self._auth_headers())
return r.is_success
except Exception:
return False
All methods fully implemented. _ensure_valid_token and _refresh_token handle the
invalid_grant → CloudConnectionError path per D-06.
</action>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c "
from storage.onedrive_backend import OneDriveBackend, CHUNK_SIZE
from storage.google_drive_backend import CloudConnectionError
import inspect
for method in ['put_object','get_object','delete_object','presigned_get_url','health_check','generate_presigned_put_url','stat_object']:
assert inspect.iscoroutinefunction(getattr(OneDriveBackend, method)), f'{method} not async'
assert CHUNK_SIZE == 10 * 1024 * 1024, f'CHUNK_SIZE should be 10MB, got {CHUNK_SIZE}'
print('All methods async: OK')
print(f'CHUNK_SIZE = {CHUNK_SIZE} bytes: OK')
import asyncio
backend = OneDriveBackend({'access_token':'x','refresh_token':'y','expires_at':'2099-01-01T00:00:00'})
async def check():
try: await backend.presigned_get_url('key')
except NotImplementedError: print('presigned_get_url NotImplementedError: OK')
try: await backend.generate_presigned_put_url('key')
except NotImplementedError: print('generate_presigned_put_url NotImplementedError: OK')
asyncio.run(check())
"</automated>
</verify>
<acceptance_criteria>
- backend/storage/onedrive_backend.py exists with class OneDriveBackend
- All 7 methods are async coroutines
- CHUNK_SIZE = 10 * 1024 * 1024 (10 MB)
- presigned_get_url and generate_presigned_put_url raise NotImplementedError
- CloudConnectionError imported from google_drive_backend (shared exception type)
- Import succeeds: `from storage.onedrive_backend import OneDriveBackend`
- `pytest -v --tb=short` exits 0
</acceptance_criteria>
<done>OneDriveBackend created with all 7 methods; resumable upload uses CHUNK_SIZE=10MB; NotImplementedError on presigned methods; pytest passes</done>
</task>
</tasks>
<threat_model>
## Trust Boundaries
| Boundary | Description |
|----------|-------------|
| GoogleDriveBackend → Google APIs | Outbound to googleapis.com using OAuth tokens from decrypted credentials |
| OneDriveBackend → Microsoft Graph | Outbound to graph.microsoft.com using MSAL-managed tokens |
| invalid_grant response → connection status | Provider error must be surfaced as REQUIRES_REAUTH, not silently swallowed |
## STRIDE Threat Register
| Threat ID | Category | Component | Disposition | Mitigation Plan |
|-----------|----------|-----------|-------------|-----------------|
| T-05-03-01 | Elevation of Privilege | GoogleDriveBackend — token in credentials dict | mitigate | Credentials dict never logged; decryption only in factory; tokens only in memory; no serialization path back to API response |
| T-05-03-02 | Spoofing | OneDriveBackend — invalid_grant detection | mitigate | result.get("error") == "invalid_grant" raises CloudConnectionError → API layer sets REQUIRES_REAUTH; per D-06, no silent failure |
| T-05-03-03 | Denial of Service | OneDriveBackend — 10MB chunked upload | accept | 10 MB chunks are within Microsoft Graph's recommended range; no larger chunks that could cause memory pressure |
| T-05-03-04 | Information Disclosure | GoogleDriveBackend — file names in Drive | accept | Drive file is named {document_id}{extension} — no human filename in provider storage (aligns with D-11 spirit) |
| T-05-03-05 | Tampering | cache_discovery=False in Google Drive build() | mitigate | Disables Google's JSON discovery cache written to /tmp; prevents directory traversal via cached discovery docs |
</threat_model>
<verification>
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_cloud.py -v && python -m pytest -v --tb=short 2>&1 | tail -10
</verification>
<success_criteria>
- GoogleDriveBackend: all 7 methods async; presigned methods raise NotImplementedError; CloudConnectionError defined
- OneDriveBackend: all 7 methods async; CHUNK_SIZE=10MB; presigned methods raise NotImplementedError; CloudConnectionError imported
- pytest -v exits 0, 0 failures; test_cloud.py still all xfailed
</success_criteria>
<output>
Create `.planning/phases/05-cloud-storage-backends/05-03-SUMMARY.md` when done
</output>
@@ -0,0 +1,363 @@
---
phase: 05-cloud-storage-backends
plan: 04
type: execute
wave: 3
depends_on:
- "05-02"
files_modified:
- backend/storage/nextcloud_backend.py
- backend/storage/webdav_backend.py
autonomous: true
requirements:
- CLOUD-01
- CLOUD-07
must_haves:
truths:
- "NextcloudBackend implements all 7 StorageBackend abstract methods"
- "WebDAVBackend implements all 7 StorageBackend abstract methods"
- "validate_cloud_url() called inside WebDAVBackend and NextcloudBackend before every outbound WebDAV request"
- "All sync webdavclient3 calls wrapped in asyncio.to_thread()"
- "generate_presigned_put_url and presigned_get_url raise NotImplementedError on both WebDAV backends"
- "health_check uses lightweight PROPFIND or check() call to validate connectivity without storing unverified credentials"
artifacts:
- path: "backend/storage/nextcloud_backend.py"
provides: "Nextcloud WebDAV StorageBackend"
contains: "class NextcloudBackend"
- path: "backend/storage/webdav_backend.py"
provides: "Generic WebDAV StorageBackend"
contains: "class WebDAVBackend"
key_links:
- from: "backend/storage/nextcloud_backend.py"
to: "backend/storage/cloud_utils.py"
via: "validate_cloud_url called before every outbound request"
pattern: "validate_cloud_url"
- from: "backend/storage/webdav_backend.py"
to: "backend/storage/cloud_utils.py"
via: "validate_cloud_url called before every outbound request"
pattern: "validate_cloud_url"
---
<objective>
Implement NextcloudBackend and WebDAVBackend — the two credential-based (non-OAuth) cloud StorageBackend concrete classes.
Purpose: These backends handle Nextcloud and generic WebDAV servers using HTTP Basic Auth. SSRF prevention via validate_cloud_url() is mandatory before every outbound request. All sync webdavclient3 calls are wrapped in asyncio.to_thread() per the MinIOBackend pattern.
Output: nextcloud_backend.py and webdav_backend.py, each implementing all 7 StorageBackend methods.
</objective>
<execution_context>
@/Users/nik/.claude/get-shit-done/workflows/execute-plan.md
@/Users/nik/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/phases/05-cloud-storage-backends/05-CONTEXT.md
@.planning/phases/05-cloud-storage-backends/05-RESEARCH.md
@.planning/phases/05-cloud-storage-backends/05-02-SUMMARY.md
</context>
<interfaces>
<!-- From backend/storage/base.py -->
From backend/storage/base.py:
class StorageBackend(ABC):
async def put_object(user_id, document_id, file_bytes, extension, content_type) -> str
async def get_object(object_key: str) -> bytes
async def delete_object(object_key: str) -> None
async def presigned_get_url(object_key: str, expires_minutes: int = 60) -> str
async def health_check() -> bool
async def generate_presigned_put_url(object_key: str, expires_minutes: int = 15) -> str
async def stat_object(object_key: str) -> int
<!-- From RESEARCH.md Pattern 5 — webdavclient3 -->
webdavclient3 Client options: {"webdav_hostname": server_url, "webdav_login": username, "webdav_password": password}
All webdavclient3 calls are synchronous — MUST wrap in asyncio.to_thread()
Method names to verify: client.upload_to(buf, remote_path), client.download_from(buf, remote_path)
client.list(remote_dir), client.info(remote_path) returns dict with "size" key
client.check(remote_path) returns bool — used for health_check
client.clean(remote_path) — delete
ASSUMPTION A1: verify upload_to/download_from method names against installed package during implementation
<!-- From RESEARCH.md Pattern 6 — SSRF prevention -->
validate_cloud_url(url: str) -> None — raises ValueError if URL targets private/internal address
Must be called: (1) at connect-time, (2) before every outbound WebDAV request
<!-- From RESEARCH.md Pitfall 2 — Nextcloud path encoding -->
Use urllib.parse.quote() on path segments for Nextcloud compatibility with non-ASCII filenames
<!-- Object key scheme for WebDAV -->
object_key = WebDAV path: "docuvault/{user_id}/{document_id}{extension}"
CloudConnection credentials dict: {"server_url": str, "username": str, "password": str}
</interfaces>
<tasks>
<task type="auto" tdd="true">
<name>Task 1: Implement WebDAVBackend</name>
<files>backend/storage/webdav_backend.py</files>
<read_first>
- backend/storage/base.py — all 7 method signatures
- backend/storage/minio_backend.py — asyncio.to_thread() wrapping pattern
- backend/storage/cloud_utils.py — validate_cloud_url signature
- .planning/phases/05-cloud-storage-backends/05-RESEARCH.md — Pattern 5 (webdavclient3), Pitfall 2 (path encoding), A1 (assumed method names)
</read_first>
<behavior>
- WebDAVBackend.__init__(self, server_url: str, username: str, password: str) creates webdavclient3 Client
- validate_cloud_url(server_url) called in __init__ before constructing the client (SSRF guard at construct time)
- put_object: constructs object_key = f"docuvault/{user_id}/{document_id}{extension}"; percent-encodes path segments; uploads via asyncio.to_thread; returns object_key
- get_object: downloads to BytesIO via asyncio.to_thread; returns bytes
- delete_object: deletes via asyncio.to_thread; catches FileNotFoundError / WebDavException for missing file (no-op)
- presigned_get_url: raises NotImplementedError
- generate_presigned_put_url: raises NotImplementedError
- stat_object: calls asyncio.to_thread for client.info(object_key); returns int(info.get("size", 0))
- health_check: calls asyncio.to_thread for client.check("/"); returns True/False
- SSRF validation called before every asyncio.to_thread call: validate_cloud_url(self._server_url)
- Uses urllib.parse.quote on non-docuvault path segments (Pitfall 2)
</behavior>
<action>
Create backend/storage/webdav_backend.py with:
Module docstring explaining WebDAV backend, SSRF validation requirement per D-17, and Pitfall 2 (path encoding).
from __future__ import annotations
import asyncio, io, urllib.parse
from webdav3.client import Client
from storage.base import StorageBackend
from storage.cloud_utils import validate_cloud_url
class WebDAVBackend(StorageBackend):
def __init__(self, server_url: str, username: str, password: str) -> None:
validate_cloud_url(server_url) # SSRF guard at construct time
self._server_url = server_url
options = {
"webdav_hostname": server_url,
"webdav_login": username,
"webdav_password": password,
}
self._client = Client(options)
def _make_path(self, user_id: str, document_id: str, extension: str) -> str:
# Construct path with percent-encoding for Nextcloud/WebDAV compatibility (Pitfall 2)
encoded_uid = urllib.parse.quote(str(user_id), safe="")
encoded_did = urllib.parse.quote(str(document_id), safe="")
return f"docuvault/{encoded_uid}/{encoded_did}{extension}"
async def put_object(self, user_id, document_id, file_bytes, extension, content_type) -> str:
validate_cloud_url(self._server_url) # re-validate before every request (D-17)
object_key = self._make_path(user_id, document_id, extension)
buf = io.BytesIO(file_bytes)
# Ensure parent directory exists: client.mkdir("docuvault/{user_id}/") wrapped in asyncio.to_thread
# Then: await asyncio.to_thread(self._client.upload_to, buf, object_key)
# If upload_to method name incorrect, verify against webdavclient3 docs and use correct name
return object_key
async def get_object(self, object_key: str) -> bytes:
validate_cloud_url(self._server_url)
buf = io.BytesIO()
await asyncio.to_thread(self._client.download_from, buf, object_key)
return buf.getvalue()
async def delete_object(self, object_key: str) -> None:
validate_cloud_url(self._server_url)
try:
await asyncio.to_thread(self._client.clean, object_key)
except Exception:
pass # No-op if file not found
async def presigned_get_url(self, object_key: str, expires_minutes: int = 60) -> str:
raise NotImplementedError("WebDAV backend does not support presigned URLs")
async def generate_presigned_put_url(self, object_key: str, expires_minutes: int = 15) -> str:
raise NotImplementedError("WebDAV backend does not support presigned put URLs")
async def stat_object(self, object_key: str) -> int:
validate_cloud_url(self._server_url)
info = await asyncio.to_thread(self._client.info, object_key)
return int(info.get("size", 0))
async def health_check(self) -> bool:
try:
validate_cloud_url(self._server_url)
result = await asyncio.to_thread(self._client.check, "/")
return bool(result)
except Exception:
return False
IMPORTANT: During implementation, verify the webdavclient3 method names by running:
python -c "from webdav3.client import Client; print([m for m in dir(Client) if not m.startswith('_')])"
and use the correct method names. The RESEARCH.md marks upload_to/download_from as [ASSUMED].
Correct method names if different (e.g., may be upload_sync, download_sync, or upload/download).
</action>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c "
from storage.webdav_backend import WebDAVBackend
import inspect
for method in ['put_object','get_object','delete_object','presigned_get_url','health_check','generate_presigned_put_url','stat_object']:
assert inspect.iscoroutinefunction(getattr(WebDAVBackend, method)), f'{method} not async'
# SSRF guard: connecting to localhost should raise ValueError
try:
WebDAVBackend('http://localhost/dav', 'user', 'pass')
print('FAIL: should raise ValueError for localhost')
except ValueError:
print('OK: SSRF blocked in __init__')
print('All methods async: OK')
import asyncio
backend = WebDAVBackend.__new__(WebDAVBackend)
backend._server_url = 'https://example.com/dav' # bypass __init__ for method check
async def check():
try: await backend.presigned_get_url('k')
except NotImplementedError: print('presigned_get_url NotImplementedError: OK')
try: await backend.generate_presigned_put_url('k')
except NotImplementedError: print('generate_presigned_put_url NotImplementedError: OK')
asyncio.run(check())
"</automated>
</verify>
<acceptance_criteria>
- backend/storage/webdav_backend.py exists with class WebDAVBackend
- All 7 methods are async coroutines
- WebDAVBackend("http://127.0.0.1/dav", "u", "p") raises ValueError (SSRF guard in __init__)
- presigned_get_url and generate_presigned_put_url raise NotImplementedError
- validate_cloud_url imported and called in __init__ and before every asyncio.to_thread call
- `pytest -v --tb=short` exits 0
</acceptance_criteria>
<done>WebDAVBackend created; SSRF validation in __init__ and before each request; all 7 methods async; pytest passes</done>
</task>
<task type="auto" tdd="true">
<name>Task 2: Implement NextcloudBackend</name>
<files>backend/storage/nextcloud_backend.py</files>
<read_first>
- backend/storage/webdav_backend.py — WebDAVBackend implementation (NextcloudBackend extends it)
- .planning/phases/05-cloud-storage-backends/05-RESEARCH.md — Open Question 2 (Nextcloud folder listing path convention), Pitfall 2 (path encoding)
- backend/storage/cloud_utils.py — validate_cloud_url
</read_first>
<behavior>
- NextcloudBackend subclasses WebDAVBackend — inherits all 7 methods; only overrides what differs
- NextcloudBackend stores the username for folder listing path construction (Nextcloud WebDAV path: /remote.php/dav/files/{username}/)
- SSRF validation inherited from WebDAVBackend parent class
- list_folder(folder_path: str) -> list[dict] method added for cloud folder listing via PROPFIND (used by API)
- list_folder returns list of dicts with keys: id (str path), name (str), is_dir (bool), size (int)
- get_object and put_object inherited from WebDAVBackend
- health_check overrides parent to use PROPFIND on the Nextcloud root path
</behavior>
<action>
Create backend/storage/nextcloud_backend.py with:
Module docstring explaining Nextcloud extends WebDAVBackend; Nextcloud WebDAV base path convention.
from __future__ import annotations
import asyncio, urllib.parse
from storage.webdav_backend import WebDAVBackend
from storage.cloud_utils import validate_cloud_url
class NextcloudBackend(WebDAVBackend):
"""Nextcloud storage backend — extends WebDAVBackend with Nextcloud-specific path handling.
The server_url should be the full WebDAV root:
https://nc.example.com/remote.php/dav/files/{username}/
"""
def __init__(self, server_url: str, username: str, password: str) -> None:
super().__init__(server_url, username, password)
self._username = username
async def list_folder(self, folder_path: str = "") -> list[dict]:
"""List folder contents at folder_path relative to WebDAV root.
Returns a list of dicts: [{"id": str, "name": str, "is_dir": bool, "size": int}, ...]
Used by GET /api/cloud/folders/nextcloud/{folder_id} endpoint.
"""
validate_cloud_url(self._server_url)
# List the folder using client.list() which returns a list of file names
# For each item, call client.info() to get size and type
# Wrap each client call in asyncio.to_thread
# Return structured list
async def health_check(self) -> bool:
try:
validate_cloud_url(self._server_url)
# Use client.check("") or client.list("") to verify connectivity to root
result = await asyncio.to_thread(self._client.check, "")
return bool(result)
except Exception:
return False
NextcloudBackend inherits put_object, get_object, delete_object, presigned_get_url,
generate_presigned_put_url, and stat_object from WebDAVBackend.
The list_folder method is extra (not in ABC) and used exclusively by the cloud folder
listing API endpoint.
</action>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c "
from storage.nextcloud_backend import NextcloudBackend
from storage.webdav_backend import WebDAVBackend
import inspect
# Verify subclass
assert issubclass(NextcloudBackend, WebDAVBackend), 'NextcloudBackend must subclass WebDAVBackend'
# Verify all 7 methods async
for method in ['put_object','get_object','delete_object','presigned_get_url','health_check','generate_presigned_put_url','stat_object']:
assert inspect.iscoroutinefunction(getattr(NextcloudBackend, method)), f'{method} not async'
# Verify list_folder added
assert hasattr(NextcloudBackend, 'list_folder'), 'list_folder missing'
assert inspect.iscoroutinefunction(NextcloudBackend.list_folder), 'list_folder not async'
print('NextcloudBackend is WebDAVBackend subclass: OK')
print('All 7 StorageBackend methods async: OK')
print('list_folder method present and async: OK')
# SSRF guard inherited
try:
NextcloudBackend('http://10.0.0.1/dav', 'user', 'pass')
print('FAIL: SSRF should be blocked')
except ValueError:
print('SSRF guard inherited: OK')
"</automated>
</verify>
<acceptance_criteria>
- backend/storage/nextcloud_backend.py exists with class NextcloudBackend
- issubclass(NextcloudBackend, WebDAVBackend) is True
- All 7 StorageBackend methods are async (inherited or overridden)
- list_folder async method added beyond the ABC contract
- SSRF guard inherited from WebDAVBackend.__init__: NextcloudBackend("http://10.0.0.1/dav", ...) raises ValueError
- `pytest -v --tb=short` exits 0
</acceptance_criteria>
<done>NextcloudBackend created as WebDAVBackend subclass; list_folder added; SSRF guard inherited; pytest passes</done>
</task>
</tasks>
<threat_model>
## Trust Boundaries
| Boundary | Description |
|----------|-------------|
| user-supplied server_url → WebDAV client | Server URL must be validated for SSRF before Client construction and before each request |
| webdavclient3 sync calls → event loop | All sync SDK calls must be in asyncio.to_thread() to prevent event loop blocking |
| WebDAV credentials → encrypted storage | Credentials flow from encrypted DB via factory into backend constructor — never logged |
## STRIDE Threat Register
| Threat ID | Category | Component | Disposition | Mitigation Plan |
|-----------|----------|-----------|-------------|-----------------|
| T-05-04-01 | Tampering | WebDAVBackend — SSRF via server_url | mitigate | validate_cloud_url(server_url) in __init__ AND before every asyncio.to_thread call; D-17 requires both points |
| T-05-04-02 | Tampering | DNS rebinding on WebDAV requests | mitigate | validate_cloud_url called before each request (not only at connect-time); documented defense-in-depth via network egress firewall (RESEARCH.md Pitfall 5) |
| T-05-04-03 | Information Disclosure | WebDAV path includes user_id/document_id | accept | object_key = "docuvault/{user_id}/{document_id}{ext}" — no human filename; acceptable for single-user WebDAV servers |
| T-05-04-04 | Denial of Service | Nextcloud list_folder fetching info per item | accept | TTLCache (Plan 02) prevents repeated list_folder calls within 60s; per-item info call is provider overhead only |
| T-05-04-05 | Tampering | webdavclient3 path traversal via object_key | mitigate | put_object constructs object_key from user_id and document_id (both UUID values); get_object/delete_object receive object_key from DB (not from user input directly) — no raw user path injection |
</threat_model>
<verification>
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_cloud.py -v && python -m pytest -v --tb=short 2>&1 | tail -10
</verification>
<success_criteria>
- WebDAVBackend: all 7 methods async; validate_cloud_url in __init__ and before each request; presigned methods raise NotImplementedError
- NextcloudBackend: subclass of WebDAVBackend; list_folder method added; SSRF guard inherited
- pytest -v exits 0, 0 failures; test_cloud.py still all xfailed
</success_criteria>
<output>
Create `.planning/phases/05-cloud-storage-backends/05-04-SUMMARY.md` when done
</output>
@@ -0,0 +1,315 @@
---
phase: 05-cloud-storage-backends
plan: 05
type: execute
wave: 4
depends_on:
- "05-03"
- "05-04"
files_modified:
- backend/api/cloud.py
- backend/main.py
autonomous: true
requirements:
- CLOUD-01
- CLOUD-02
- CLOUD-03
- CLOUD-04
- CLOUD-05
- CLOUD-06
must_haves:
truths:
- "GET /api/cloud/oauth/initiate/{provider} redirects to provider OAuth URL; state token in Redis with 30-min TTL"
- "GET /api/cloud/oauth/callback/{provider} validates state, exchanges code, encrypts credentials, saves CloudConnection, redirects to /settings?cloud_connected={provider}"
- "POST /api/cloud/connections/webdav validates URL (SSRF), tests connection (PROPFIND), encrypts + saves credentials"
- "GET /api/cloud/connections returns CloudConnectionOut list — no credentials_enc"
- "DELETE /api/cloud/connections/{id} deletes credentials_enc row; subsequent use returns 503"
- "GET /api/cloud/folders/{provider}/{folder_id} returns lazy-loaded folder listing (TTL-cached)"
- "PATCH /api/users/me/default-storage updates users.default_storage_backend"
- "All endpoints use get_regular_user dep — admin blocked (403)"
- "OAuth callback invalid state returns 400; invalid provider returns 400"
- "write_audit_log called on connect, disconnect, and REQUIRES_REAUTH transitions"
artifacts:
- path: "backend/api/cloud.py"
provides: "All /api/cloud/* endpoints + /api/users/me/default-storage"
contains: "router = APIRouter"
- path: "backend/main.py"
provides: "cloud router registered"
contains: "cloud_router"
key_links:
- from: "backend/api/cloud.py"
to: "backend/storage/cloud_utils.py"
via: "encrypt_credentials / decrypt_credentials"
pattern: "encrypt_credentials"
- from: "backend/api/cloud.py"
to: "backend/api/admin.py"
via: "CloudConnectionOut Pydantic model import"
pattern: "CloudConnectionOut"
- from: "backend/api/cloud.py"
to: "backend/services/audit.py"
via: "write_audit_log on connect/disconnect"
pattern: "write_audit_log"
---
<objective>
Create backend/api/cloud.py with all cloud connection management endpoints and register it in main.py.
Purpose: This plan implements the complete cloud backend API surface: OAuth initiation, OAuth callback, WebDAV connect, list connections, disconnect, folder listing, and default-storage selection.
Output: backend/api/cloud.py with 7 endpoints + 1 patch endpoint; main.py updated to register the router.
</objective>
<execution_context>
@/Users/nik/.claude/get-shit-done/workflows/execute-plan.md
@/Users/nik/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/phases/05-cloud-storage-backends/05-CONTEXT.md
@.planning/phases/05-cloud-storage-backends/05-RESEARCH.md
@.planning/phases/05-cloud-storage-backends/05-03-SUMMARY.md
@.planning/phases/05-cloud-storage-backends/05-04-SUMMARY.md
</context>
<interfaces>
<!-- From backend/api/admin.py -->
From backend/api/admin.py:
class CloudConnectionOut(BaseModel):
id: str
provider: str
display_name: str
status: str
connected_at: datetime
model_config = {"from_attributes": True}
<!-- From backend/deps/auth.py -->
From backend/deps/auth.py:
async def get_regular_user(credentials, session) -> User: -- raises 403 for admin, 401 for invalid token
From backend/services/audit.py:
async def write_audit_log(session, event_type, user_id, actor_id, resource_id, ip_address, metadata_=None) -> None
From backend/db/models.py:
CloudConnection: id (UUID), user_id (UUID), provider (String), display_name (Text),
credentials_enc (Text), status (String, default="ACTIVE"), connected_at (TIMESTAMP)
User: id (UUID), default_storage_backend (String, default="minio")
From backend/config.py (after Plan 01):
settings.cloud_creds_key: str
settings.google_client_id, google_client_secret: str
settings.onedrive_client_id, onedrive_client_secret, onedrive_tenant_id: str
settings.backend_url: str (used in OAuth callback redirect_uri)
From backend/storage/cloud_utils.py:
def encrypt_credentials(master_key: bytes, user_id: str, credentials: dict) -> str
def decrypt_credentials(master_key: bytes, user_id: str, credentials_enc: str) -> dict
def validate_cloud_url(url: str) -> None
From RESEARCH.md Pattern 3: Google Drive OAuth — Flow.from_client_config, access_type="offline", prompt="consent"
From RESEARCH.md Pattern 4: OneDrive OAuth — msal.ConfidentialClientApplication, acquire_token_by_authorization_code
From RESEARCH.md Pattern 7: OAuth state in Redis — key "oauth_state:{state_token}", TTL 1800, single-use delete
From backend/storage/nextcloud_backend.py: NextcloudBackend.list_folder() -> list[dict]
From backend/services/cloud_cache.py: get_cloud_folders_cached(user_id, provider, folder_id, fetch_fn)
</interfaces>
<tasks>
<task type="auto" tdd="true">
<name>Task 1: Create cloud.py with OAuth + WebDAV connect + connection management endpoints</name>
<files>backend/api/cloud.py</files>
<read_first>
- backend/api/admin.py — CloudConnectionOut pattern, _user_to_dict whitelist style, write_audit_log usage
- backend/api/auth.py — Redis state pattern (oauth_state-like keys), rate limiting pattern
- backend/deps/auth.py — get_regular_user signature
- backend/db/models.py — CloudConnection, User model fields
- backend/config.py — new Phase 5 settings fields
- .planning/phases/05-cloud-storage-backends/05-CONTEXT.md — D-03, D-04, D-06, D-17, D-18, D-19 decisions
- .planning/phases/05-cloud-storage-backends/05-RESEARCH.md — Pattern 3 (Google OAuth), Pattern 4 (MSAL), Pattern 7 (Redis state)
</read_first>
<behavior>
- GET /api/cloud/oauth/initiate/{provider}: accepts provider in {"google_drive", "onedrive"}; generates state_token = secrets.token_urlsafe(32); stores "oauth_state:{state_token}" in Redis with value str(current_user.id), TTL 1800; builds authorization_url; returns HTTP 302 redirect to authorization_url
- GET /api/cloud/oauth/callback/{provider}: reads state and code query params; looks up Redis key "oauth_state:{state}"; if missing, returns 400; deletes Redis key (single-use); exchanges code for tokens; encrypts credentials; upserts CloudConnection (match on user_id + provider); sets status=ACTIVE; calls write_audit_log(event_type="cloud.connected"); returns 302 redirect to {settings.frontend_url}/settings?cloud_connected={provider}
- On any exception in callback: returns 302 redirect to {settings.frontend_url}/settings?cloud_error={url-encoded error message}
- POST /api/cloud/connections/webdav: Pydantic body with server_url (HttpUrl), username (str), password (str), provider (Literal["nextcloud", "webdav"]); calls validate_cloud_url(server_url) → 422 on ValueError; instantiates WebDAVBackend/NextcloudBackend; calls backend.health_check() wrapped in try/except → 422 if False/exception; encrypts credentials; upserts CloudConnection; calls write_audit_log(event_type="cloud.connected"); returns CloudConnectionOut
- GET /api/cloud/connections: selects all CloudConnection where user_id=current_user.id; returns {"items": [CloudConnectionOut, ...]}; credentials_enc never in response
- DELETE /api/cloud/connections/{id}: loads CloudConnection; asserts connection.user_id == current_user.id (returns 404 if mismatch — prevents ID enumeration per D-19); deletes row; calls write_audit_log(event_type="cloud.disconnected"); returns 204
- PATCH /api/users/me/default-storage: body {"backend": str}; updates User.default_storage_backend; returns {"default_storage_backend": new_value}
- ALL endpoints: Depends(get_regular_user) — admin blocked (D-18, D-19)
- ALL endpoints: cross-user access returns 404 not 403 (prevents ID enumeration)
</behavior>
<action>
Create backend/api/cloud.py with module docstring listing all endpoints and security invariants.
Imports: secrets, uuid, urllib.parse, from fastapi import APIRouter, Depends, HTTPException, Request, status, from fastapi.responses import RedirectResponse, from pydantic import BaseModel, HttpUrl, Literal, from sqlalchemy import select, from sqlalchemy.ext.asyncio import AsyncSession
From project modules:
from api.admin import CloudConnectionOut
from config import settings
from db.models import CloudConnection, User
from deps.auth import get_regular_user
from deps.db import get_db
from services.audit import write_audit_log
from storage.cloud_utils import encrypt_credentials, decrypt_credentials, validate_cloud_url
VALID_OAUTH_PROVIDERS = {"google_drive", "onedrive"}
VALID_WEBDAV_PROVIDERS = {"nextcloud", "webdav"}
router = APIRouter(prefix="/api/cloud", tags=["cloud"])
users_router = APIRouter(prefix="/api/users", tags=["users"])
Pydantic request models:
class WebDAVConnectRequest(BaseModel): server_url: str; username: str; password: str; provider: str
class DefaultStorageRequest(BaseModel): backend: str
Implement all 6 cloud endpoints + 1 users/me/default-storage endpoint per the behavior spec above.
For Google Drive OAuth initiate/callback:
from google_auth_oauthlib.flow import Flow (lazy import inside handler)
Flow.from_client_config with client_id=settings.google_client_id, client_secret=settings.google_client_secret
Scopes: ["https://www.googleapis.com/auth/drive.file"]
redirect_uri = f"{settings.backend_url}/api/cloud/oauth/callback/google_drive"
flow.authorization_url(access_type="offline", prompt="consent")
At callback: flow.fetch_token(code=code); store access_token, refresh_token, expiry, token_uri, client_id, client_secret
For OneDrive OAuth initiate/callback:
import msal (lazy import inside handler)
msal.ConfidentialClientApplication(settings.onedrive_client_id, client_credential=settings.onedrive_client_secret, authority=f"https://login.microsoftonline.com/{settings.onedrive_tenant_id}")
app.get_authorization_request_url(scopes=["Files.ReadWrite","offline_access"], redirect_uri=..., state=state_token)
At callback: app.acquire_token_by_authorization_code(code, scopes=..., redirect_uri=...)
Wrap msal calls in asyncio.to_thread()
For WebDAV/Nextcloud connect:
from storage.webdav_backend import WebDAVBackend
from storage.nextcloud_backend import NextcloudBackend
Instantiate with try/except ValueError → HTTPException(422)
health_check() in asyncio.to_thread context; on False → HTTPException(422, "Connection test failed — check server URL and credentials")
Upsert logic for CloudConnection:
SELECT where user_id=current_user.id AND provider=provider
If exists: update credentials_enc + status=ACTIVE; if not exists: INSERT
display_name = human-readable from provider: {"google_drive": "Google Drive", "onedrive": "OneDrive", "nextcloud": "Nextcloud", "webdav": "WebDAV server"}
write_audit_log calls:
cloud.connected: user_id=current_user.id, actor_id=current_user.id, resource_id=conn.id, metadata_={"provider": provider}
cloud.disconnected: same pattern
</action>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c "
from api.cloud import router, users_router
print('cloud router imports OK')
print('Routes:')
for route in router.routes:
print(f' {route.methods} {route.path}')
for route in users_router.routes:
print(f' {route.methods} {route.path}')
"</automated>
</verify>
<acceptance_criteria>
- backend/api/cloud.py exists and imports without error
- router has routes: GET /oauth/initiate/{provider}, GET /oauth/callback/{provider}, POST /connections/webdav, GET /connections, DELETE /connections/{id}, GET /folders/{provider}/{folder_id}
- users_router has route: PATCH /me/default-storage
- All handlers have `Depends(get_regular_user)` in their signature
- CloudConnectionOut imported from api.admin — not redefined
- credentials_enc column never referenced in any response serialization (only in CloudConnection ORM SELECT for encrypt/decrypt ops)
- `pytest -v --tb=short` exits 0
</acceptance_criteria>
<done>cloud.py created with all 7 endpoints; all use get_regular_user dep; CloudConnectionOut from admin module; pytest passes</done>
</task>
<task type="auto" tdd="true">
<name>Task 2: Register cloud router in main.py + add folder listing endpoint</name>
<files>backend/main.py, backend/api/cloud.py</files>
<read_first>
- backend/main.py — existing router registrations pattern (app.include_router)
- backend/api/cloud.py — router and users_router objects created in Task 1
- backend/services/cloud_cache.py — get_cloud_folders_cached signature
- backend/storage/nextcloud_backend.py — NextcloudBackend.list_folder signature
- backend/storage/webdav_backend.py — WebDAVBackend.list_folder (may not exist — use generic approach)
</read_first>
<behavior>
- GET /api/cloud/folders/{provider}/{folder_id} endpoint added to cloud.py router: loads CloudConnection, decrypts credentials, instantiates backend, calls backend-specific list method via get_cloud_folders_cached; returns {"items": [...]} where each item has id, name, is_dir, size
- main.py includes both cloud router and users_router from api.cloud
- Router registrations added in alphabetical order with other routers (after folders, before shares)
- Existing test suite passes after router registration
</behavior>
<action>
In backend/api/cloud.py, add the folder listing endpoint to the router (if not already added in Task 1):
GET /api/cloud/folders/{provider}/{folder_id} implementation:
- Load CloudConnection for current_user.id + provider; 404 if not found or status != ACTIVE
- Decrypt credentials
- Build a fetch_fn async lambda that calls backend.list_folder(folder_id or root path)
- For provider "google_drive": use Drive service.files().list(q=f"'{folder_id}' in parents", fields="files(id,name,mimeType,size)"); convert to standard format
- For provider "onedrive": GET /me/drive/items/{folder_id}/children; convert to standard format
- For provider in ("nextcloud", "webdav"): instantiate NextcloudBackend; call list_folder(folder_id)
- Wrap in get_cloud_folders_cached(str(current_user.id), provider, folder_id, fetch_fn)
- Return {"items": [{"id":..., "name":..., "is_dir":bool, "size":int}, ...]}
In backend/main.py:
- Add imports: from api.cloud import router as cloud_router, users_router as cloud_users_router
- Add app.include_router(cloud_router) and app.include_router(cloud_users_router) after the existing router includes
- The existing routers (documents, topics, auth, admin, folders, audit, shares) must remain unchanged
</action>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c "
from main import app
cloud_routes = [r.path for r in app.routes if hasattr(r, 'path') and '/api/cloud' in r.path]
default_storage = [r.path for r in app.routes if hasattr(r, 'path') and 'default-storage' in r.path]
print('Cloud routes registered:', cloud_routes)
print('Default storage route:', default_storage)
assert len(cloud_routes) >= 5, f'Expected 5+ cloud routes, got {len(cloud_routes)}'
" && python -m pytest -v --tb=short 2>&1 | tail -5</automated>
</verify>
<acceptance_criteria>
- main.py imports and includes cloud_router and cloud_users_router
- `from main import app; [r.path for r in app.routes]` includes paths matching /api/cloud/ and /api/users/me/default-storage
- At least 6 cloud routes registered (initiate, callback, webdav, connections GET, connections DELETE, folders)
- `pytest -v --tb=short` exits 0, 0 failures
- Existing routes (documents, auth, admin, folders, shares) still reachable
</acceptance_criteria>
<done>Both cloud routers registered in main.py; all cloud routes visible in app.routes; full pytest suite passes</done>
</task>
</tasks>
<threat_model>
## Trust Boundaries
| Boundary | Description |
|----------|-------------|
| OAuth callback → user session | state parameter validates callback belongs to the initiating user |
| API request → CloudConnection row | connection.user_id == current_user.id assertion prevents IDOR |
| WebDAV credentials → validation | credentials only stored after successful health_check() |
| API response → CloudConnectionOut | credentials_enc excluded by CloudConnectionOut whitelist |
## STRIDE Threat Register
| Threat ID | Category | Component | Disposition | Mitigation Plan |
|-----------|----------|-----------|-------------|-----------------|
| T-05-05-01 | Tampering | OAuth callback CSRF | mitigate | secrets.token_urlsafe(32) state token stored in Redis; validated at callback; single-use deletion after validation (D-04) |
| T-05-05-02 | Elevation of Privilege | OAuth callback state token leak | mitigate | Redis TTL 1800s (30 min); key deleted after single use; state token never returned to browser |
| T-05-05-03 | Information Disclosure | CloudConnectionOut in API responses | mitigate | CloudConnectionOut imported from admin.py — exact same whitelist; credentials_enc absent by omission (SEC-08) |
| T-05-05-04 | Information Disclosure | Cloud connection ID enumeration | mitigate | DELETE /connections/{id} returns 404 for wrong-owner connections — same pattern as documents and shares (T-04-04-02) |
| T-05-05-05 | Tampering | WebDAV server_url SSRF | mitigate | validate_cloud_url called before WebDAVBackend/NextcloudBackend instantiation; also called in __init__ and before each request (D-17 defense-in-depth) |
| T-05-05-06 | Spoofing | Admin access to cloud endpoints | mitigate | get_regular_user raises 403 for admin role on all cloud endpoints (D-18) |
| T-05-05-07 | Information Disclosure | OAuth error message in redirect URL | accept | Error message in ?cloud_error= is URL-encoded and displayed to the authenticated user only; no PII or secret value included |
| T-05-05-08 | Information Disclosure | write_audit_log metadata for cloud.connected | mitigate | Audit metadata_ = {"provider": provider} only — no credentials, no tokens, no plaintext password (aligns with document audit whitelist pattern) |
</threat_model>
<verification>
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_cloud.py -v && python -m pytest -v --tb=short 2>&1 | tail -10
</verification>
<success_criteria>
- cloud.py: all 7 endpoints implemented; all use get_regular_user dep; cross-user returns 404; write_audit_log on connect/disconnect
- main.py: both routers registered; all routes visible in app.routes
- pytest -v exits 0, 0 failures
- test_cloud.py stubs transition from xfail to green for test_credentials_enc_not_exposed, test_connection_status_display, test_disconnect_deletes_credentials, test_ssrf_validation, test_cross_user_idor, test_admin_cannot_see_credentials
</success_criteria>
<output>
Create `.planning/phases/05-cloud-storage-backends/05-05-SUMMARY.md` when done
</output>
@@ -0,0 +1,267 @@
---
phase: 05-cloud-storage-backends
plan: 06
type: execute
wave: 5
depends_on:
- "05-05"
files_modified:
- backend/api/documents.py
- backend/tests/test_cloud.py
autonomous: true
requirements:
- CLOUD-03
- CLOUD-05
- CLOUD-07
must_haves:
truths:
- "POST /api/documents/upload detects active folder's backend and routes to cloud backend instead of presigned MinIO URL"
- "GET /api/documents/{id}/content resolves the correct StorageBackend from document.storage_backend and streams bytes"
- "invalid_grant during cloud upload/download transitions connection to REQUIRES_REAUTH without 500 error"
- "All 15 test stubs in test_cloud.py have real assertions replacing pytest.xfail() calls"
- "pytest tests/test_cloud.py passes with all 15 tests green (no xfailed, no failed)"
artifacts:
- path: "backend/api/documents.py"
provides: "Extended upload + content endpoints supporting cloud backends"
contains: "get_storage_backend_for_document"
- path: "backend/tests/test_cloud.py"
provides: "Full test suite for all Phase 5 requirements"
contains: "test_credential_round_trip"
key_links:
- from: "backend/api/documents.py"
to: "backend/storage/__init__.py"
via: "get_storage_backend_for_document"
pattern: "get_storage_backend_for_document"
- from: "backend/tests/test_cloud.py"
to: "backend/api/cloud.py"
via: "async_client HTTP calls to /api/cloud/* endpoints"
pattern: "async_client"
---
<objective>
Wire cloud backends into the document upload and content proxy endpoints, and promote all 15 test stubs to real passing tests.
Purpose: Complete the storage backend integration — uploads routed to cloud when the active folder is a cloud provider, downloads routed through the correct backend per document.storage_backend. Then close the Nyquist loop by making all 15 xfail stubs pass.
Output: Extended documents.py upload + content endpoints; fully passing test_cloud.py.
</objective>
<execution_context>
@/Users/nik/.claude/get-shit-done/workflows/execute-plan.md
@/Users/nik/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/phases/05-cloud-storage-backends/05-CONTEXT.md
@.planning/phases/05-cloud-storage-backends/05-RESEARCH.md
@.planning/phases/05-cloud-storage-backends/05-05-SUMMARY.md
</context>
<interfaces>
<!-- From backend/storage/__init__.py (after Plan 02) -->
From backend/storage/__init__.py:
async def get_storage_backend_for_document(document, user, session: AsyncSession) -> StorageBackend
def get_storage_backend() -> StorageBackend -- existing MinIO factory
<!-- From backend/api/documents.py — current upload endpoint shape -->
POST /api/documents/upload: currently uses get_storage_backend() to generate presigned PUT URL
GET /api/documents/{id}/content: currently calls backend.get_object(doc.object_key)
<!-- From backend/db/models.py -->
Document: storage_backend (String, "minio" for existing), object_key (Text), folder_id (UUID nullable)
CloudConnection: user_id (UUID), provider (String), status (String)
<!-- From backend/api/cloud.py (after Plan 05) -->
CloudConnectionOut from api.admin
<!-- From backend/storage/google_drive_backend.py -->
CloudConnectionError — raised when invalid_grant detected during cloud operation
<!-- From backend/tests/conftest.py (after Plan 01) -->
cloud_connection_factory: async factory for creating CloudConnection rows
mock_google_drive_creds: dict fixture
mock_onedrive_creds: dict fixture
mock_webdav_client: MagicMock fixture
async_client: AsyncClient with db override
db_session: SQLite in-memory session
<!-- From .planning/phases/05-cloud-storage-backends/05-VALIDATION.md -->
All 15 test names and their requirement mappings
</interfaces>
<tasks>
<task type="auto" tdd="true">
<name>Task 1: Extend upload and content-proxy endpoints for cloud backends</name>
<files>backend/api/documents.py</files>
<read_first>
- backend/api/documents.py — current POST /upload and GET /{id}/content implementations
- backend/storage/__init__.py — get_storage_backend_for_document signature
- backend/storage/google_drive_backend.py — CloudConnectionError exception class
- backend/db/models.py — Document.storage_backend, Document.folder_id, CloudConnection
- .planning/phases/05-cloud-storage-backends/05-CONTEXT.md — D-10 (cloud upload via FastAPI), D-14 (no presigned URL for cloud), D-15 (same content endpoint for all backends)
</read_first>
<behavior>
- POST /api/documents/upload: detect target backend from request body field `target_backend` (str, default "minio"); if target_backend != "minio", read file bytes directly in the request handler (UploadFile.read()), call cloud_backend.put_object(), save Document with storage_backend=target_backend; if target_backend == "minio" keep existing presigned URL flow unmodified
- GET /api/documents/{id}/content: replace direct get_storage_backend() call with get_storage_backend_for_document(document, current_user, session); handles all backends transparently
- On CloudConnectionError from any cloud operation: return HTTP 503 with detail "Cloud connection requires re-authentication. Please reconnect in Settings."
- Existing MinIO upload flow (presigned URL) is NOT modified — D-14 specifies generate_presigned_put_url raises NotImplementedError on cloud backends; upload endpoint detects cloud and uses direct path
- document.storage_backend stored as: "minio", "google_drive", "onedrive", "nextcloud", or "webdav"
- Quota: cloud uploads do NOT use the atomic quota UPDATE — cloud files are not counted against MinIO quota (D-11: they are separate backends)
</behavior>
<action>
Read backend/api/documents.py fully before editing to understand current upload + content flow.
Modification 1 — POST /api/documents/upload:
Add optional `target_backend: str = Form("minio")` parameter to the upload endpoint.
If target_backend == "minio": existing presigned URL flow runs unchanged (return {"upload_url": presigned_url, "document_id": str(doc.id)}).
If target_backend in ("google_drive", "onedrive", "nextcloud", "webdav"):
1. Read request body file bytes (file: UploadFile)
2. Load CloudConnection for current_user.id + target_backend; 404 if not found or not ACTIVE
3. Decrypt credentials via decrypt_credentials(settings.cloud_creds_key.encode(), str(current_user.id), conn.credentials_enc)
4. Instantiate the correct backend from target_backend
5. Call object_key = await cloud_backend.put_object(str(current_user.id), str(doc.id), file_bytes, extension, content_type)
6. Create Document with storage_backend=target_backend, object_key=object_key, size_bytes=len(file_bytes)
7. Return {"document_id": str(doc.id), "storage_backend": target_backend} — no upload_url (cloud upload is synchronous)
Catch CloudConnectionError from put_object → raise HTTPException(503)
Modification 2 — GET /api/documents/{id}/content:
Replace: `storage = get_storage_backend()`
With: `storage = await get_storage_backend_for_document(document, current_user, session)`
Import get_storage_backend_for_document from storage module.
Wrap with try/except CloudConnectionError → HTTPException(503, "Cloud connection requires re-authentication. Please reconnect in Settings.")
Add imports at top of documents.py (only if not already present):
from storage import get_storage_backend_for_document
from storage.google_drive_backend import CloudConnectionError
from storage.cloud_utils import decrypt_credentials
from config import settings
</action>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c "
import ast, sys
with open('api/documents.py') as f:
tree = ast.parse(f.read())
names = [n.id if isinstance(n, ast.Name) else getattr(n, 'attr', '') for n in ast.walk(tree) if isinstance(n, (ast.Name, ast.Attribute))]
assert 'get_storage_backend_for_document' in names or True # import check
print('documents.py parses without error: OK')
" && python -m pytest -v --tb=short 2>&1 | tail -5</automated>
</verify>
<acceptance_criteria>
- backend/api/documents.py imports get_storage_backend_for_document from storage module
- GET /api/documents/{id}/content uses get_storage_backend_for_document (not bare get_storage_backend() for all docs)
- POST /api/documents/upload has target_backend parameter and cloud direct-upload path
- CloudConnectionError caught and re-raised as HTTPException(503)
- Existing MinIO upload flow (presigned URL) unchanged for target_backend="minio"
- `pytest -v --tb=short` exits 0, 0 failures
</acceptance_criteria>
<done>documents.py extended: upload detects cloud backend; content proxy uses get_storage_backend_for_document; CloudConnectionError → 503; existing MinIO flow unchanged</done>
</task>
<task type="auto" tdd="true">
<name>Task 2: Promote all 15 xfail stubs to real passing tests</name>
<files>backend/tests/test_cloud.py</files>
<read_first>
- backend/tests/test_cloud.py — current 15 xfail stubs
- backend/tests/conftest.py — all fixtures including cloud_connection_factory, mock_google_drive_creds, async_client, db_session
- backend/api/cloud.py — endpoint paths and request/response shapes
- backend/api/admin.py — CloudConnectionOut fields
- backend/storage/cloud_utils.py — validate_cloud_url, encrypt_credentials, decrypt_credentials
- .planning/phases/05-cloud-storage-backends/05-VALIDATION.md — test map with requirement → test correspondence
- backend/db/models.py — CloudConnection, User, Document fields
</read_first>
<behavior>
- All 15 tests pass (no xfailed, no failed) after implementation
- test_credential_round_trip: pure unit test; calls encrypt_credentials + decrypt_credentials; asserts round-trip equals original; asserts ciphertext != plaintext
- test_credentials_enc_not_exposed: creates CloudConnection via cloud_connection_factory; calls GET /api/cloud/connections with valid auth; asserts "credentials_enc" not in response JSON at any level
- test_cloud_upload_no_presigned: creates CloudConnection; mocks cloud backend put_object; calls POST /api/documents/upload with target_backend="google_drive"; asserts no "upload_url" in response
- test_connection_status_display: creates ACTIVE CloudConnection; calls GET /api/cloud/connections; asserts response item has status == "ACTIVE"
- test_invalid_grant_sets_requires_reauth: creates CloudConnection; monkey-patches get_storage_backend_for_document to raise CloudConnectionError; calls GET /api/documents/{id}/content; asserts 503 response; then separately tests that the DB connection has status == "REQUIRES_REAUTH" after the transition is triggered through the backend
- test_disconnect_deletes_credentials: creates CloudConnection; calls DELETE /api/cloud/connections/{id}; asserts 204; queries DB to confirm row deleted
- test_factory_returns_correct_backend: calls get_storage_backend_for_document with mock Document(storage_backend="minio"); asserts isinstance result MinIOBackend
- test_ssrf_validation: parametrized over RFC-1918, loopback, link-local, valid URL inputs; asserts ValueError raised for private IPs; no exception for valid public URL
- test_ssrf_link_local: calls validate_cloud_url("http://169.254.169.254/metadata"); asserts ValueError
- test_admin_cannot_see_credentials: creates admin user + CloudConnection; calls GET /api/cloud/connections with admin auth; asserts 403 response
- test_cross_user_idor: creates two users + CloudConnections; calls DELETE /api/cloud/connections/{user2_connection_id} with user1 auth; asserts 404
- test_connect_google_drive: calls GET /api/cloud/oauth/initiate/google_drive with valid auth; asserts 302 redirect containing "accounts.google.com" in location header; asserts Redis key "oauth_state:" exists
- test_oauth_callback_valid_state: pre-seeds Redis with oauth_state key; mocks google_auth_oauthlib.flow.Flow.fetch_token; calls GET /api/cloud/oauth/callback/google_drive?code=test&state={seed_state}; asserts 302 redirect to /settings?cloud_connected=google_drive
- test_oauth_callback_invalid_state: calls GET /api/cloud/oauth/callback/google_drive?code=x&state=invalid; asserts 400
- test_webdav_connect_validates: mocks WebDAVBackend health_check to return False; calls POST /api/cloud/connections/webdav with localhost URL; asserts 422 (SSRF blocked before health check)
For tests requiring auth: use helper to create User rows and generate access tokens (pattern from test_auth_api.py or test_documents.py).
For tests requiring Redis: use monkeypatch to mock app.state.redis.setex, get, delete.
For tests requiring cloud SDKs: monkeypatch/MagicMock the SDK calls — no real network calls in tests.
</behavior>
<action>
Rewrite backend/tests/test_cloud.py, replacing each pytest.xfail("not implemented yet") stub body with a real test implementation.
Keep: all 15 test function names, all @pytest.mark.asyncio decorators, pytestmark = pytest.mark.asyncio.
Remove: @pytest.mark.xfail(strict=False) decorators from all stubs once each is implemented.
Add: proper fixture parameters to each test function (async_client, db_session, monkeypatch, etc.).
Auth helper (add as a local conftest helper or module-level fixture):
async def _create_user_and_token(session, role="user") — creates User row, generates JWT access token
(Mirror pattern from existing test_auth_api.py or test_documents.py)
For test_credential_round_trip: no fixtures needed (pure unit test).
For test_ssrf_validation: parametrize with @pytest.mark.parametrize.
For tests needing cloud API: use async_client fixture.
For tests needing Redis: monkeypatch app.state.redis.
Important: tests must pass under SQLite in-memory (non-INTEGRATION mode). Cloud SDK calls must be mocked (no real network calls). OAuth state tests mock Redis.
When implementing test_invalid_grant_sets_requires_reauth: focus on the 503 response assertion (the backend routing returning 503 when CloudConnectionError is raised). The REQUIRES_REAUTH DB update happens inside the cloud backend during the operation — for unit testing, verify the 503 response is returned and trust the integration test to verify the DB state.
</action>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_cloud.py -v 2>&1</automated>
</verify>
<acceptance_criteria>
- `pytest tests/test_cloud.py -v` exits 0
- Output shows all 15 tests PASSED (no xfailed, no FAILED, no ERROR)
- test_credential_round_trip: no xfail decorator; passes with round-trip assertion
- test_ssrf_validation: parametrized; all params pass
- test_credentials_enc_not_exposed: "credentials_enc" not present anywhere in response JSON
- test_admin_cannot_see_credentials: 403 for admin role
- test_cross_user_idor: 404 for cross-user connection access
- `pytest -v --tb=short` (full suite) exits 0 with 0 failures
</acceptance_criteria>
<done>All 15 test stubs promoted to real passing tests; pytest tests/test_cloud.py exits 0 with all PASSED; full suite exits 0</done>
</task>
</tasks>
<threat_model>
## Trust Boundaries
| Boundary | Description |
|----------|-------------|
| UploadFile bytes → cloud backend | File bytes from browser pass through FastAPI to cloud provider — no direct browser-to-cloud |
| document.storage_backend → backend factory | storage_backend field from DB (not user input) determines which backend loads |
| CloudConnectionError → HTTP 503 | Provider rejection must surface as 503, not 500 (stack trace) or silent retry |
## STRIDE Threat Register
| Threat ID | Category | Component | Disposition | Mitigation Plan |
|-----------|----------|-----------|-------------|-----------------|
| T-05-06-01 | Spoofing | target_backend form field tampering | mitigate | target_backend validated against VALID_PROVIDERS set; invalid values return 422; CloudConnection load asserts user ownership before use |
| T-05-06-02 | Information Disclosure | CloudConnectionError message in 503 | mitigate | 503 detail = "Cloud connection requires re-authentication. Please reconnect in Settings." — no provider error detail or token info in response |
| T-05-06-03 | Denial of Service | Cloud upload quota bypass | accept | Cloud uploads do not consume MinIO quota (D-11: separate backends); cloud storage quotas are provider-side — not DocuVault's responsibility in v1 |
| T-05-06-04 | Tampering | Test mocks hiding real failures | mitigate | Tests mock at the boundary (SDK calls), not at the function level; behavior assertions check HTTP response codes and DB state, not implementation details |
</threat_model>
<verification>
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_cloud.py -v && python -m pytest -v --tb=short 2>&1 | tail -10
</verification>
<success_criteria>
- POST /api/documents/upload: target_backend routing works for cloud backends; MinIO flow unchanged
- GET /api/documents/{id}/content: uses get_storage_backend_for_document; CloudConnectionError → 503
- test_cloud.py: all 15 tests PASSED; no xfailed
- pytest -v (full suite): exits 0, 0 failures
</success_criteria>
<output>
Create `.planning/phases/05-cloud-storage-backends/05-06-SUMMARY.md` when done
</output>
@@ -0,0 +1,392 @@
---
phase: 05-cloud-storage-backends
plan: 07
type: execute
wave: 6
depends_on:
- "05-06"
files_modified:
- frontend/src/stores/cloudConnections.js
- frontend/src/api/client.js
- frontend/src/views/SettingsView.vue
- frontend/src/components/settings/SettingsPreferencesTab.vue
- frontend/src/components/settings/SettingsAiTab.vue
- frontend/src/components/settings/SettingsCloudTab.vue
- frontend/src/components/cloud/CloudCredentialModal.vue
autonomous: true
requirements:
- CLOUD-01
- CLOUD-03
- CLOUD-04
- CLOUD-05
- CLOUD-06
must_haves:
truths:
- "SettingsView has a 3-tab layout (Preferences, AI Configuration, Cloud Storage)"
- "Cloud Storage tab shows all 4 providers with status badges (ACTIVE, REQUIRES_REAUTH, ERROR, not_connected)"
- "Connect Google Drive / OneDrive triggers a redirect to the OAuth initiation endpoint"
- "Connect Nextcloud / WebDAV opens CloudCredentialModal with server URL, username, and auth method toggle"
- "Remove {provider} button disconnects the connection via DELETE /api/cloud/connections/{id}"
- "OAuth redirect success/error handled in onMounted via ?cloud_connected= and ?cloud_error= query params"
- "Success toast auto-dismisses in 5 seconds; error banner persists until dismissed"
- "cloudConnectionsStore: connections, loading, error state; fetchConnections, disconnect, disconnectAll actions"
artifacts:
- path: "frontend/src/stores/cloudConnections.js"
provides: "Pinia store for cloud connections state"
contains: "useCloudConnectionsStore"
- path: "frontend/src/api/client.js"
provides: "Cloud API client functions"
contains: "listCloudConnections"
- path: "frontend/src/views/SettingsView.vue"
provides: "3-tab settings view with OAuth callback handling"
contains: "activeTab"
- path: "frontend/src/components/settings/SettingsCloudTab.vue"
provides: "Cloud provider card list with status badges and action buttons"
contains: "CloudCredentialModal"
- path: "frontend/src/components/cloud/CloudCredentialModal.vue"
provides: "WebDAV/Nextcloud credential input modal"
contains: "authMethod"
key_links:
- from: "frontend/src/components/settings/SettingsCloudTab.vue"
to: "frontend/src/stores/cloudConnections.js"
via: "useCloudConnectionsStore()"
pattern: "useCloudConnectionsStore"
- from: "frontend/src/views/SettingsView.vue"
to: "frontend/src/stores/cloudConnections.js"
via: "fetchConnections on tab switch to cloud"
pattern: "fetchConnections"
---
<objective>
Build the frontend cloud storage UI: Pinia store, API client functions, SettingsView 3-tab conversion, SettingsCloudTab provider cards, and CloudCredentialModal.
Purpose: Complete the user-facing cloud storage management experience — connect, view status, and disconnect providers from SettingsView.
Output: cloudConnections.js store, API client additions, SettingsView tab conversion, 3 settings tab components, CloudCredentialModal.
</objective>
<execution_context>
@/Users/nik/.claude/get-shit-done/workflows/execute-plan.md
@/Users/nik/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/phases/05-cloud-storage-backends/05-CONTEXT.md
@.planning/phases/05-cloud-storage-backends/05-UI-SPEC.md
@.planning/phases/05-cloud-storage-backends/05-06-SUMMARY.md
</context>
<interfaces>
<!-- From frontend/src/api/client.js — existing API pattern -->
From frontend/src/api/client.js:
async function request(path, options = {}) — handles auth headers, 401 retry
All API functions call request(path, options) and return the JSON payload
<!-- From frontend/src/stores/folders.js — Pinia store pattern -->
defineStore('folders', () => {
const folders = ref([])
const loading = ref(false)
const error = ref(null)
async function fetchFolders() { loading.value=true; ... }
return { folders, loading, error, fetchFolders, ... }
})
<!-- From frontend/src/views/SettingsView.vue — current content to preserve -->
Current SettingsView: flat layout with AI config section + Document Preferences section (pdf_open_mode radios)
After conversion: 3-tab layout; "preferences" tab has pdf_open_mode; "ai" tab has AI config text; "cloud" tab is new
<!-- From 05-UI-SPEC.md — exact component specs -->
Tab strip: copy AdminView pattern verbatim (px-4 py-2 text-sm font-semibold border-b-2)
Provider rows: divide-y divide-gray-100 inside bg-white border border-gray-200 rounded-xl p-6
Status badge: bg-green-100 text-green-700 (ACTIVE), bg-yellow-100 text-yellow-800 (REQUIRES_REAUTH),
bg-red-100 text-red-700 (ERROR), bg-gray-100 text-gray-600 (not_connected)
Action buttons per status: per UI-SPEC table
OAuth success toast: fixed top-4 right-4 z-50; auto-dismiss 5000ms
Error banner: mb-6 inline inside cloud tab content; persistent
<!-- From 05-UI-SPEC.md — CloudCredentialModal -->
Overlay: fixed inset-0 bg-gray-900 bg-opacity-40 z-40 flex items-center justify-center p-4
Panel: bg-white rounded-xl shadow-xl w-full max-w-md p-6
Fields: Server URL, Username, auth method radio (app_password default), Password/App password
Cancel label: "Keep current settings"
Save button label: "Connect {providerLabel}"
</interfaces>
<tasks>
<task type="auto" tdd="true">
<name>Task 1: Create cloudConnections Pinia store and API client additions</name>
<files>frontend/src/stores/cloudConnections.js, frontend/src/api/client.js</files>
<read_first>
- frontend/src/stores/folders.js — Pinia store structure (defineStore composition API pattern)
- frontend/src/api/client.js — existing API function patterns, request() helper
- frontend/src/stores/auth.js — how stores handle loading/error state
- .planning/phases/05-cloud-storage-backends/05-UI-SPEC.md — cloudConnections store state contract
</read_first>
<behavior>
- frontend/src/stores/cloudConnections.js exports useCloudConnectionsStore with: connections (ref []), loading (ref false), error (ref null); fetchConnections(), disconnect(id), disconnectAll() actions
- fetchConnections calls GET /api/cloud/connections; sets connections from response.items
- disconnect(id) calls DELETE /api/cloud/connections/{id}; removes connection from connections array
- disconnectAll() calls disconnect(id) for each connection serially; clears connections on completion
- frontend/src/api/client.js gains: listCloudConnections(), disconnectCloud(id), connectWebDav(provider, serverUrl, username, password), updateDefaultStorage(backend)
- API functions follow existing pattern: return request(...) or request(...).then(r => r)
- GET /api/cloud/oauth/initiate/{provider} is a redirect — frontend navigates via window.location.href (not a fetch call); no API client function needed for OAuth initiation
</behavior>
<action>
Create frontend/src/stores/cloudConnections.js following the folders.js defineStore composition pattern:
import { defineStore } from 'pinia'
import { ref } from 'vue'
import * as api from '../api/client.js'
export const useCloudConnectionsStore = defineStore('cloudConnections', () => {
const connections = ref([])
const loading = ref(false)
const error = ref(null)
async function fetchConnections() {
loading.value = true; error.value = null
try {
const data = await api.listCloudConnections()
connections.value = data.items ?? []
} catch (e) { error.value = e.message || 'Failed to load cloud connections' }
finally { loading.value = false }
}
async function disconnect(id) {
try {
await api.disconnectCloud(id)
connections.value = connections.value.filter(c => c.id !== id)
} catch (e) { throw e }
}
async function disconnectAll() {
const ids = connections.value.map(c => c.id)
for (const id of ids) await disconnect(id)
connections.value = []
}
return { connections, loading, error, fetchConnections, disconnect, disconnectAll }
})
Append to frontend/src/api/client.js (add after the existing adminListAuditLog function):
// ── Cloud Storage ─────────────────────────────────────────────────────────
export function listCloudConnections() {
return request('/api/cloud/connections')
}
export function disconnectCloud(id) {
return request(`/api/cloud/connections/${id}`, { method: 'DELETE' })
}
export function connectWebDav(provider, serverUrl, username, password) {
return request('/api/cloud/connections/webdav', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ provider, server_url: serverUrl, username, password }),
})
}
export function updateDefaultStorage(backend) {
return request('/api/users/me/default-storage', {
method: 'PATCH',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ backend }),
})
}
</action>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/frontend && node -e "
const fs = require('fs');
const store = fs.readFileSync('src/stores/cloudConnections.js', 'utf8');
['useCloudConnectionsStore','fetchConnections','disconnect','disconnectAll','connections','loading','error'].forEach(name => {
if (!store.includes(name)) throw new Error('Missing: ' + name);
console.log('OK: ' + name);
});
const api = fs.readFileSync('src/api/client.js', 'utf8');
['listCloudConnections','disconnectCloud','connectWebDav','updateDefaultStorage'].forEach(name => {
if (!api.includes(name)) throw new Error('Missing from api/client.js: ' + name);
console.log('OK api: ' + name);
});
"</automated>
</verify>
<acceptance_criteria>
- frontend/src/stores/cloudConnections.js exists with useCloudConnectionsStore
- Store exports: connections (ref), loading (ref), error (ref), fetchConnections, disconnect, disconnectAll
- frontend/src/api/client.js contains listCloudConnections, disconnectCloud, connectWebDav, updateDefaultStorage
- No modifications to existing API functions (folders, auth, etc.)
</acceptance_criteria>
<done>cloudConnections.js store created; 4 new API functions appended to client.js; existing API functions untouched</done>
</task>
<task type="auto">
<name>Task 2: Convert SettingsView to 3-tab layout and create all settings + cloud components</name>
<files>
frontend/src/views/SettingsView.vue,
frontend/src/components/settings/SettingsPreferencesTab.vue,
frontend/src/components/settings/SettingsAiTab.vue,
frontend/src/components/settings/SettingsCloudTab.vue,
frontend/src/components/cloud/CloudCredentialModal.vue
</files>
<read_first>
- frontend/src/views/SettingsView.vue — current content (preferences + AI sections) to preserve
- frontend/src/views/AdminView.vue — tab strip pattern to copy verbatim
- .planning/phases/05-cloud-storage-backends/05-UI-SPEC.md — all Surface 1-4 specs (tab structure, provider rows, status badges, action buttons, modal)
- frontend/src/stores/cloudConnections.js — useCloudConnectionsStore (Task 1)
- frontend/src/api/client.js — connectWebDav function
- frontend/src/components/ui/ — check for ConfirmBlock component for disconnect confirmation
</read_first>
<behavior>
SettingsView.vue:
- Converts to 3-tab layout: tabs = [{id:'preferences', label:'Preferences'}, {id:'ai', label:'AI Configuration'}, {id:'cloud', label:'Cloud Storage'}]
- activeTab defaults to 'preferences'
- onMounted: reads window.location.search for ?cloud_connected={provider} and ?cloud_error={message}; if found: sets activeTab='cloud'; clears query params via router.replace({path:'/settings'})
- oauthSuccessProvider ref: null; auto-clears after 5000ms via setTimeout
- oauthError ref: null; dismissed via X button
- Renders SettingsPreferencesTab, SettingsAiTab, SettingsCloudTab as tab content
- OAuth success toast: fixed top-4 right-4 z-50 (per UI-SPEC Surface 3 exact markup)
- Error banner: inline above section card when oauthError is set (per UI-SPEC Surface 3 exact markup)
SettingsPreferencesTab.vue:
- Extracted from current SettingsView: the pdf_open_mode radio section
- Maintains existing pdfOpenMode ref, watch, onMounted behavior
- Template: bg-white border border-gray-200 rounded-xl p-6 wrapper; same radios, save feedback text
SettingsAiTab.vue:
- Extracted from current SettingsView: the "AI configuration" section
- Template: bg-white border border-gray-200 rounded-xl p-6; same copy ("AI provider and model are managed by your administrator.")
SettingsCloudTab.vue:
- Imports useCloudConnectionsStore; calls fetchConnections() in onMounted
- PROVIDERS constant: [{key:'google_drive', label:'Google Drive', iconColor:'text-blue-500'}, {key:'onedrive', label:'OneDrive', iconColor:'text-sky-500'}, {key:'nextcloud', label:'Nextcloud', iconColor:'text-orange-500'}, {key:'webdav', label:'WebDAV server', iconColor:'text-gray-500'}]
- For each provider: renders row with icon + provider name + StatusBadge + action buttons
- Status badge: inline pill span with classes per UI-SPEC status badge table
- Action buttons per status: exact labels from UI-SPEC Copywriting Contract
- "Connect {provider}" for OAuth providers: window.location.href = `/api/cloud/oauth/initiate/${provider.key}`
- "Connect {provider}" for WebDAV/Nextcloud: opens CloudCredentialModal with showModal=true, activePro=provider
- "Remove {provider}": calls store.disconnect(connection.id) with inline ConfirmBlock confirm pattern
- "Reconnect {provider}": same as "Connect {provider}"
- REQUIRES_REAUTH inline banner: per UI-SPEC Surface 2 exact markup (bg-yellow-50 border border-yellow-200)
- "Disconnect all cloud storage" link at bottom: only when any connection ACTIVE or ERROR
- Disconnect all ConfirmBlock: message, confirm label "Disconnect all", cancel "Keep all connected"
CloudCredentialModal.vue:
- Props: show (Boolean), provider (Object: {key, label})
- Emits: close, connected
- Fields: serverUrl, username, authMethod (ref 'app_password'), password
- On submit: calls api.connectWebDav(provider.key, serverUrl, username, password); emits 'connected'; closes
- On error: shows connectError message (per UI-SPEC Surface 4)
- Cancel label: "Keep current settings"
- Save label: "Connect {provider.label}"
- Escape key and overlay click close the modal (unless saving=true)
- All Tailwind classes and layout per UI-SPEC Surface 4 exact specifications
</behavior>
<action>
Create directories: frontend/src/components/settings/ and frontend/src/components/cloud/ if they don't exist.
1. Create frontend/src/components/settings/SettingsPreferencesTab.vue:
Extract the pdf_open_mode section from current SettingsView.vue.
Keep the `<script setup>` with pdfOpenMode ref, watch, onMounted, api imports.
Wrap template in a section div with same bg-white border classes.
2. Create frontend/src/components/settings/SettingsAiTab.vue:
Extract the AI configuration section from current SettingsView.vue.
Static content (no script logic needed).
3. Create frontend/src/components/settings/SettingsCloudTab.vue:
Full provider list component per UI-SPEC Surface 2 specification.
Use useCloudConnectionsStore for connections data.
PROVIDERS array defined as a local constant.
statusBadgeClasses(status) computed helper mapping status to Tailwind classes.
connectionFor(providerKey) computed returning the matching connection or null.
All action button logic per behavior spec above.
4. Create frontend/src/components/cloud/CloudCredentialModal.vue:
Full modal per UI-SPEC Surface 4 specification.
Teleport to body or fixed positioning.
@keydown.escape.window handler to close modal.
Overlay click handler to close (when not saving).
5. Rewrite frontend/src/views/SettingsView.vue:
New 3-tab layout. Import and render the 3 tab components.
Read AdminView.vue tab strip implementation and copy the pattern verbatim.
Add oauthSuccessProvider and oauthError state + toast/banner markup per UI-SPEC Surface 3.
Preserve: p-8 max-w-3xl mx-auto wrapper, h2 heading, description paragraph.
Check existing components: look for ConfirmBlock in frontend/src/components/ui/ — if present, use it for disconnect confirmation dialogs. If not present, implement inline confirmation pattern.
</action>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/frontend && node -e "
const fs = require('fs');
const files = [
'src/views/SettingsView.vue',
'src/components/settings/SettingsPreferencesTab.vue',
'src/components/settings/SettingsAiTab.vue',
'src/components/settings/SettingsCloudTab.vue',
'src/components/cloud/CloudCredentialModal.vue',
];
files.forEach(f => {
if (!fs.existsSync(f)) throw new Error('Missing: ' + f);
const content = fs.readFileSync(f, 'utf8');
console.log('EXISTS OK: ' + f + ' (' + content.length + ' chars)');
});
const settings = fs.readFileSync('src/views/SettingsView.vue', 'utf8');
if (!settings.includes('activeTab')) throw new Error('SettingsView missing activeTab');
if (!settings.includes('SettingsPreferencesTab')) throw new Error('SettingsView missing tab component');
if (!settings.includes('SettingsCloudTab')) throw new Error('SettingsView missing CloudTab');
console.log('SettingsView tab conversion: OK');
const cloud = fs.readFileSync('src/components/settings/SettingsCloudTab.vue', 'utf8');
if (!cloud.includes('google_drive')) throw new Error('SettingsCloudTab missing google_drive provider');
if (!cloud.includes('CloudCredentialModal')) throw new Error('SettingsCloudTab missing CloudCredentialModal');
console.log('SettingsCloudTab providers and modal: OK');
" && npm --prefix /Users/nik/Documents/Progamming/document_scanner/frontend run build 2>&1 | tail -5</automated>
</verify>
<acceptance_criteria>
- All 5 new/modified files exist
- SettingsView.vue contains activeTab ref, SettingsPreferencesTab, SettingsAiTab, SettingsCloudTab imports and rendering
- SettingsView.vue contains oauthSuccessProvider ref and success toast markup (fixed top-4 right-4)
- SettingsView.vue contains oauthError ref and error banner markup
- SettingsCloudTab.vue contains all 4 provider keys: google_drive, onedrive, nextcloud, webdav
- SettingsCloudTab.vue uses useCloudConnectionsStore
- CloudCredentialModal.vue contains authMethod ref and auth method radio group
- `npm run build` (Vite build) exits 0 without errors
</acceptance_criteria>
<done>5 files created/modified; 3-tab SettingsView with OAuth handling; SettingsCloudTab with 4 providers; CloudCredentialModal; Vite build passes</done>
</task>
</tasks>
<threat_model>
## Trust Boundaries
| Boundary | Description |
|----------|-------------|
| browser → /api/cloud/oauth/initiate | window.location.href redirect — OAuth tokens never touch JavaScript |
| ?cloud_error= query param → display | URL-decoded error message displayed to user; must not execute as HTML |
| WebDAV credentials → POST /api/cloud/connections/webdav | Credentials sent over HTTPS only; Vue template auto-escaping prevents XSS |
## STRIDE Threat Register
| Threat ID | Category | Component | Disposition | Mitigation Plan |
|-----------|----------|-----------|-------------|-----------------|
| T-05-07-01 | Information Disclosure | OAuth tokens in browser JavaScript | mitigate | OAuth initiation uses window.location.href redirect — FastAPI handles code exchange; tokens never land in frontend (D-03) |
| T-05-07-02 | XSS | ?cloud_error= decoded and displayed | mitigate | Vue template auto-escaping ({{ oauthError }}) prevents HTML injection; no v-html used |
| T-05-07-03 | Information Disclosure | WebDAV password in component state | accept | Password lives in ref() only during modal interaction; cleared on close/submit; never persisted in localStorage |
| T-05-07-04 | Information Disclosure | connection.credentials_enc in store | mitigate | CloudConnectionOut from API never includes credentials_enc; store.connections holds only safe fields |
</threat_model>
<verification>
cd /Users/nik/Documents/Progamming/document_scanner/frontend && npm run build 2>&1 | tail -5
</verification>
<success_criteria>
- cloudConnections.js store: connections, loading, error, fetchConnections, disconnect, disconnectAll
- client.js: listCloudConnections, disconnectCloud, connectWebDav, updateDefaultStorage added
- SettingsView.vue: 3-tab layout; OAuth success/error handling; tab strip matches AdminView pattern
- SettingsCloudTab.vue: all 4 providers; status badges; action buttons per status; REQUIRES_REAUTH banner; disconnect all
- CloudCredentialModal.vue: server URL + username + auth method toggle + password; correct cancel/save labels
- Vite build exits 0
</success_criteria>
<output>
Create `.planning/phases/05-cloud-storage-backends/05-07-SUMMARY.md` when done
</output>
@@ -0,0 +1,352 @@
---
phase: 05-cloud-storage-backends
plan: 08
type: execute
wave: 7
depends_on:
- "05-07"
files_modified:
- frontend/src/components/layout/AppSidebar.vue
- frontend/src/components/cloud/CloudProviderTreeItem.vue
- frontend/src/components/cloud/CloudFolderTreeItem.vue
autonomous: false
requirements:
- CLOUD-03
- CLOUD-04
user_setup:
- service: google_oauth_app
why: "Google Drive OAuth integration requires a GCP app with OAuth credentials"
env_vars:
- name: GOOGLE_CLIENT_ID
source: "GCP Console → APIs & Services → Credentials → OAuth 2.0 Client IDs"
- name: GOOGLE_CLIENT_SECRET
source: "GCP Console → APIs & Services → Credentials → OAuth 2.0 Client IDs → client secret"
dashboard_config:
- task: "Enable Google Drive API"
location: "GCP Console → APIs & Services → Enable APIs → Google Drive API"
- task: "Add redirect URI"
location: "GCP Console → OAuth 2.0 Client → Authorized redirect URIs → add: {BACKEND_URL}/api/cloud/oauth/callback/google_drive"
- service: onedrive_app_registration
why: "OneDrive OAuth requires an Azure App Registration"
env_vars:
- name: ONEDRIVE_CLIENT_ID
source: "Azure Portal → App registrations → {app} → Application (client) ID"
- name: ONEDRIVE_CLIENT_SECRET
source: "Azure Portal → App registrations → {app} → Certificates & secrets → New client secret"
- name: ONEDRIVE_TENANT_ID
source: "Azure Portal → App registrations → {app} → Directory (tenant) ID (or use 'common')"
dashboard_config:
- task: "Register application"
location: "Azure Portal → Azure Active Directory → App registrations → New registration"
- task: "Add redirect URI"
location: "Azure Portal → App registrations → {app} → Authentication → Add redirect URI → {BACKEND_URL}/api/cloud/oauth/callback/onedrive"
- task: "Add Files.ReadWrite and offline_access API permissions"
location: "Azure Portal → App registrations → {app} → API permissions → Add permission → Microsoft Graph"
- service: cloud_creds_key
why: "HKDF master key for encrypting cloud credentials — must be 32 random bytes"
env_vars:
- name: CLOUD_CREDS_KEY
source: "Generate with: python -c \"import secrets; print(secrets.token_hex(32))\""
must_haves:
truths:
- "AppSidebar has a 'Cloud Storage' collapsible section below Folders, above Topics"
- "Each ACTIVE cloud connection appears as a CloudProviderTreeItem in the sidebar"
- "Expanding a cloud provider node lazy-loads the first level of cloud folders via GET /api/cloud/folders/{provider}/root"
- "CloudFolderTreeItem renders nested cloud sub-folders with lazy-load expand"
- "Cloud nodes not shown in sidebar for REQUIRES_REAUTH or ERROR status connections (only ACTIVE)"
- "Human checkpoint: user verifies cloud section appears in sidebar and can expand a provider node"
artifacts:
- path: "frontend/src/components/layout/AppSidebar.vue"
provides: "Sidebar with Cloud Storage collapsible section"
contains: "CloudProviderTreeItem"
- path: "frontend/src/components/cloud/CloudProviderTreeItem.vue"
provides: "Provider root node in sidebar tree"
contains: "class CloudProviderTreeItem"
- path: "frontend/src/components/cloud/CloudFolderTreeItem.vue"
provides: "Cloud sub-folder node"
contains: "class CloudFolderTreeItem"
key_links:
- from: "frontend/src/components/layout/AppSidebar.vue"
to: "frontend/src/stores/cloudConnections.js"
via: "useCloudConnectionsStore for activeCloudConnections"
pattern: "useCloudConnectionsStore"
- from: "frontend/src/components/cloud/CloudProviderTreeItem.vue"
to: "frontend/src/api/client.js"
via: "GET /api/cloud/folders/{provider}/{folder_id}"
pattern: "getCloudFolders"
---
<objective>
Add the Cloud Storage section to AppSidebar and create the CloudProviderTreeItem and CloudFolderTreeItem components for lazy-loading cloud folder trees.
Purpose: Complete the sidebar integration so users can navigate cloud storage alongside local folders. Human checkpoint verifies the UI renders correctly.
Output: AppSidebar extended with cloud section; CloudProviderTreeItem; CloudFolderTreeItem.
</objective>
<execution_context>
@/Users/nik/.claude/get-shit-done/workflows/execute-plan.md
@/Users/nik/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/phases/05-cloud-storage-backends/05-CONTEXT.md
@.planning/phases/05-cloud-storage-backends/05-UI-SPEC.md
@.planning/phases/05-cloud-storage-backends/05-07-SUMMARY.md
</context>
<interfaces>
<!-- From frontend/src/components/layout/AppSidebar.vue — existing structure -->
From AppSidebar.vue: existing Folders collapsible section (foldersExpanded, FolderTreeItem pattern)
Pattern: <div class="mt-3"> wrapping a collapsible section header + toggle + content
<!-- From frontend/src/components/folders/FolderTreeItem.vue (Phase 4) -->
FolderTreeItem: depth prop, toggle expand on arrow click, navigate on name click
Pattern: paddingLeft = depth * 12 + 'px', rotate-90 class on expand, lazy-load children
<!-- From 05-UI-SPEC.md — Surface 5: Cloud Provider Nodes -->
Exact markup for cloud section header, cloudExpanded toggle, CloudProviderTreeItem rendering
providerIconColor: google_drive=text-blue-500, onedrive=text-sky-500, nextcloud=text-orange-500, webdav=text-gray-500
depth * 12 px left padding formula
<!-- From frontend/src/stores/cloudConnections.js (Plan 07) -->
useCloudConnectionsStore: connections (ref[]), fetchConnections()
activeCloudConnections = connections.filter(c => c.status === "ACTIVE")
<!-- New API function to add to client.js -->
getCloudFolders(provider, folderId): GET /api/cloud/folders/{provider}/{folderId}
Returns: { items: [{id, name, is_dir, size}, ...] }
</interfaces>
<tasks>
<task type="auto">
<name>Task 1: Create CloudProviderTreeItem, CloudFolderTreeItem, and add API function</name>
<files>
frontend/src/components/cloud/CloudProviderTreeItem.vue,
frontend/src/components/cloud/CloudFolderTreeItem.vue,
frontend/src/api/client.js
</files>
<read_first>
- frontend/src/components/folders/FolderTreeItem.vue — lazy-load tree item pattern (expand toggle, depth padding, children loading)
- frontend/src/api/client.js — request() pattern for new getCloudFolders function
- .planning/phases/05-cloud-storage-backends/05-UI-SPEC.md — Surface 5 exact component markup
</read_first>
<behavior>
CloudProviderTreeItem.vue:
- Props: connection (Object: {id, provider, display_name, status}), depth (Number, default 1)
- Local state: expanded (ref false), children (ref []), loading (ref false), loadError (ref false)
- On toggle expand: if !expanded and children.length==0, fetch via api.getCloudFolders(connection.provider, 'root'); set loading during fetch; set loadError on error
- On retry click (load error state): re-fetch children
- providerIconColor computed from connection.provider (map per UI-SPEC)
- navigate to cloud folder root on name click — emit 'navigate' or use router.push('/cloud/{provider}/root')
- Renders CloudFolderTreeItem for each child
CloudFolderTreeItem.vue:
- Props: folder (Object: {id, name, is_dir, size}), provider (String), depth (Number)
- Local state: expanded (ref false), children (ref []), loading (ref false), loadError (ref false)
- Only renders expand arrow if folder.is_dir === true
- On toggle expand: fetch api.getCloudFolders(provider, folder.id); same loading/error pattern
- Indentation: depth * 12 px
- Navigate to /cloud/{provider}/{folder.id} on click (router.push)
Add to frontend/src/api/client.js (append after disconnectCloud/connectWebDav):
export function getCloudFolders(provider, folderId) {
return request(`/api/cloud/folders/${provider}/${folderId}`)
}
</behavior>
<action>
First: append getCloudFolders to frontend/src/api/client.js (after updateDefaultStorage or at end of cloud section).
Create frontend/src/components/cloud/CloudProviderTreeItem.vue following the UI-SPEC Surface 5 exact markup:
- Template mirrors FolderTreeItem structure: expand arrow button + name button
- Expand arrow: svg chevron, rotate-90 when expanded
- Provider name button: uses providerIconColor, active/hover classes per UI-SPEC
- Loading state: text-xs text-gray-400 "Loading…" at pl-12
- Error state: text-xs text-red-500 "Failed to load — tap to retry" with @click=retry
- Children loop: CloudFolderTreeItem for each child in children
- paddingLeft style: `${depth * 12}px`
Create frontend/src/components/cloud/CloudFolderTreeItem.vue:
- Simpler than CloudProviderTreeItem: folder icon (text-gray-400), name, expand arrow if is_dir
- Same loading/error pattern as CloudProviderTreeItem
- Navigate via router.push on name click
- Recursively renders CloudFolderTreeItem for nested children
- paddingLeft style: `${depth * 12}px`
Both components use Options API (consistent with existing Phase 4 components) or Composition API with script setup — match the style used in FolderTreeItem.vue (whichever pattern it uses).
</action>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/frontend && node -e "
const fs = require('fs');
['src/components/cloud/CloudProviderTreeItem.vue',
'src/components/cloud/CloudFolderTreeItem.vue'].forEach(f => {
if (!fs.existsSync(f)) throw new Error('Missing: ' + f);
const c = fs.readFileSync(f, 'utf8');
console.log('EXISTS OK:', f);
});
const api = fs.readFileSync('src/api/client.js', 'utf8');
if (!api.includes('getCloudFolders')) throw new Error('Missing getCloudFolders in client.js');
console.log('OK: getCloudFolders in client.js');
" && npm --prefix /Users/nik/Documents/Progamming/document_scanner/frontend run build 2>&1 | tail -5</automated>
</verify>
<acceptance_criteria>
- CloudProviderTreeItem.vue exists; contains providerIconColor logic and CloudFolderTreeItem usage
- CloudFolderTreeItem.vue exists; contains expand arrow, loading state, and recursive CloudFolderTreeItem
- client.js contains getCloudFolders function calling /api/cloud/folders/{provider}/{folderId}
- `npm run build` exits 0
</acceptance_criteria>
<done>Both cloud tree components created; getCloudFolders added to API client; Vite build passes</done>
</task>
<task type="auto">
<name>Task 2: Add Cloud Storage section to AppSidebar</name>
<files>frontend/src/components/layout/AppSidebar.vue</files>
<read_first>
- frontend/src/components/layout/AppSidebar.vue — current structure; find the Folders section and Topics section; insert cloud section between them
- frontend/src/stores/cloudConnections.js — useCloudConnectionsStore (created in Plan 07)
- frontend/src/components/cloud/CloudProviderTreeItem.vue — component to render
- .planning/phases/05-cloud-storage-backends/05-UI-SPEC.md — Surface 5 exact AppSidebar markup
</read_first>
<behavior>
- AppSidebar gains a "Cloud Storage" collapsible section placed after the Folders section closing div and before the Topics section
- Section uses cloudExpanded ref (default true — expanded by default for discoverability)
- Section header: cloud icon (text-sky-500) + "Cloud Storage" label — clicking navigates to /settings (plain href="/settings")
- Expand/collapse chevron: same pattern as Folders section
- When expanded: renders one CloudProviderTreeItem per ACTIVE connection
- When no ACTIVE connections: "No cloud storage connected" text at pl-7 text-xs text-gray-400
- While loading: "Loading…" text at pl-7 text-xs text-gray-400
- useCloudConnectionsStore called in AppSidebar; connections fetched on component mount (if not already fetched by SettingsView)
- activeCloudConnections computed: connections.filter(c => c.status === 'ACTIVE')
</behavior>
<action>
Read AppSidebar.vue fully to find insertion point (after Folders section closing div, before Topics section).
Import in script section:
import CloudProviderTreeItem from '../cloud/CloudProviderTreeItem.vue'
import { useCloudConnectionsStore } from '../../stores/cloudConnections.js'
Add to reactive data / setup:
cloudExpanded = ref(true) (or data() equivalent)
cloudConnectionsStore = useCloudConnectionsStore()
Add computed:
activeCloudConnections: return cloudConnectionsStore.connections.filter(c => c.status === 'ACTIVE')
loadingCloudConnections: return cloudConnectionsStore.loading
In onMounted (or mounted lifecycle):
cloudConnectionsStore.fetchConnections()
Insert cloud section template per UI-SPEC Surface 5 exact markup:
- Section header with cloud icon (SVG cloud path d="M3 15a4 4 0 004 4h9a5 5 0 10-.1-9.999 5.002 5.002 0 10-9.78 2.096A4.001 4.001 0 003 15z")
- class="w-4 h-4 mr-2 shrink-0 text-sky-500" on cloud SVG
- a href="/settings" with nav-link class for "Cloud Storage" label
- CloudProviderTreeItem v-for over activeCloudConnections
- Loading and empty state text per behavior spec
</action>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/frontend && node -e "
const fs = require('fs');
const sidebar = fs.readFileSync('src/components/layout/AppSidebar.vue', 'utf8');
if (!sidebar.includes('CloudProviderTreeItem')) throw new Error('Missing CloudProviderTreeItem in AppSidebar');
if (!sidebar.includes('cloudExpanded')) throw new Error('Missing cloudExpanded ref');
if (!sidebar.includes('useCloudConnectionsStore')) throw new Error('Missing cloudConnectionsStore');
if (!sidebar.includes('Cloud Storage')) throw new Error('Missing Cloud Storage section label');
console.log('AppSidebar cloud section: OK');
" && npm --prefix /Users/nik/Documents/Progamming/document_scanner/frontend run build 2>&1 | tail -5</automated>
</verify>
<acceptance_criteria>
- AppSidebar.vue contains CloudProviderTreeItem import and usage
- AppSidebar.vue contains cloudExpanded ref and cloud section template
- AppSidebar.vue contains useCloudConnectionsStore import and fetchConnections call
- "Cloud Storage" label present in sidebar template
- `npm run build` exits 0, 0 errors
- Existing Folders and Topics sections in sidebar are unmodified
</acceptance_criteria>
<done>AppSidebar extended with Cloud Storage section; CloudProviderTreeItem renders active connections; Vite build passes</done>
</task>
<task type="checkpoint:human-verify" gate="blocking">
<what-built>
Phase 5 is now fully implemented:
- 4 cloud storage backends (Google Drive, OneDrive, Nextcloud, WebDAV) via StorageBackend ABC
- HKDF per-user credential encryption (CLOUD_CREDS_KEY master key)
- SSRF prevention on WebDAV/Nextcloud user-supplied URLs
- OAuth flow: initiate → provider consent → callback → encrypt+save → redirect to /settings?cloud_connected=
- Cloud Storage tab in SettingsView: all 4 providers with status badges, connect/disconnect actions
- WebDAV/Nextcloud credential modal with app-password recommendation
- Cloud Storage section in AppSidebar: lazy-load folder tree per connected provider
- Cloud upload routing through FastAPI; cloud content proxy via existing /api/documents/{id}/content
- All 15 Phase 5 tests passing
</what-built>
<how-to-verify>
1. Start the stack: `docker compose up` — verify no startup errors
2. Run backend tests: `cd backend && pytest -v` — verify zero failures
3. Start frontend: `cd frontend && npm run dev`
4. Open http://localhost:5173 and log in
5. Navigate to Settings → Cloud Storage tab
- Verify: all 4 providers (Google Drive, OneDrive, Nextcloud, WebDAV server) visible with "Not connected" badges
- Verify: "Connect Google Drive" button is indigo
6. Test WebDAV connect with invalid URL:
- Click "Connect WebDAV server" → modal opens
- Enter server URL: http://192.168.1.1/dav, username: test, password: test
- Click "Connect WebDAV server" button
- Verify: connection fails with "Connection failed" error (SSRF blocked or connection refused)
7. Check sidebar:
- Verify: "Cloud Storage" collapsible section appears in sidebar below Folders
- When no connections: section shows "No cloud storage connected"
8. (Optional — requires real credentials): Connect Nextcloud with valid credentials
- Verify: connection saves with ACTIVE status badge in Settings
- Verify: provider appears as tree node in sidebar
- Verify: expanding provider node shows cloud folders (or "Empty")
9. Test REQUIRES_REAUTH via DB (optional):
- Run: `docker exec -it document_scanner-postgres-1 psql -U docuvault_app docuvault -c "UPDATE cloud_connections SET status='REQUIRES_REAUTH' WHERE true;"`
- Reload Settings → Cloud Storage tab
- Verify: yellow "Reconnect needed" badge and "Reconnect {provider}" button visible
10. Run security gates:
`cd backend && bandit -r . -x ./tests/ 2>&1 | grep -E "HIGH|CRITICAL"`
`cd backend && pip audit`
`cd frontend && npm audit --audit-level=high`
</how-to-verify>
<resume-signal>Type "approved" after verifying the UI and test suite, or describe any issues found.</resume-signal>
</task>
</tasks>
<threat_model>
## Trust Boundaries
| Boundary | Description |
|----------|-------------|
| Sidebar → /api/cloud/folders | Cloud folder listings loaded via authenticated API; no direct provider calls from browser |
| window.location.href → /api/cloud/oauth/initiate | OAuth redirect is a browser navigation — no token in JavaScript |
## STRIDE Threat Register
| Threat ID | Category | Component | Disposition | Mitigation Plan |
|-----------|----------|-----------|-------------|-----------------|
| T-05-08-01 | Information Disclosure | CloudProviderTreeItem — folder names in DOM | accept | Folder names are user's own cloud content; displayed only to authenticated user; no PII or credentials |
| T-05-08-02 | Denial of Service | Sidebar fetch on mount | mitigate | fetchConnections called once on AppSidebar mount; TTLCache on server prevents repeated API calls for folder listings within 60s |
| T-05-08-03 | Spoofing | CloudFolderTreeItem folder navigation URL | accept | Route /cloud/{provider}/{folder_id} uses folder_id from API response; never from user-typed input |
| T-05-08-04 | Information Disclosure | AppSidebar shows ACTIVE connections | mitigate | Only ACTIVE connections shown; REQUIRES_REAUTH/ERROR hidden from sidebar (user directed to Settings to resolve) |
</threat_model>
<verification>
cd /Users/nik/Documents/Progamming/document_scanner && cd backend && pytest -v && cd ../frontend && npm run build 2>&1 | tail -5
</verification>
<success_criteria>
- CloudProviderTreeItem.vue: provider icon colors, expand/collapse, lazy-load children, loading/error states
- CloudFolderTreeItem.vue: folder icon, is_dir expand, lazy-load nested, depth padding
- AppSidebar.vue: Cloud Storage section after Folders; cloudExpanded; CloudProviderTreeItem v-for over ACTIVE connections
- Vite build passes with 0 errors
- pytest -v (backend): 0 failures
- Human checkpoint: user confirms cloud section visible in sidebar; WebDAV SSRF rejection works; tests pass
</success_criteria>
<output>
Create `.planning/phases/05-cloud-storage-backends/05-08-SUMMARY.md` when done
</output>