Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
15 KiB
Phase 5: Cloud Storage Backends - Context
Gathered: 2026-05-28 Status: Ready for planning
## Phase BoundaryUsers can connect OneDrive, Google Drive, Nextcloud, or a generic WebDAV server as a personal storage backend through the DocuVault web UI. Connected cloud providers appear alongside local MinIO folders in the existing sidebar folder tree. Credentials are encrypted per-user via HKDF. Connection status is visible and manageable from a new "Cloud Storage" tab in SettingsView. Local MinIO storage and all connected cloud backends coexist — no document migration. The StorageBackend ABC is extended with four new concrete implementations.
All 4 providers ship in this phase — no phased delivery.
## Implementation DecisionsBackend Scope
- D-01: All 4 providers (OneDrive/Microsoft Graph, Google Drive v3, Nextcloud, WebDAV) are delivered in this single phase.
- D-02: Each provider is a concrete
StorageBackendsubclass inbackend/storage/(e.g.,google_drive_backend.py,onedrive_backend.py,nextcloud_backend.py,webdav_backend.py). The existing ABC's 7 abstract methods define the contract.
OAuth Flow (Google Drive & OneDrive)
- D-03: FastAPI owns the OAuth callback. Flow: user clicks "Connect" in SettingsView → redirected to provider's OAuth consent page → provider redirects to
GET /api/cloud/oauth/callback/{provider}?code=…&state=…→ FastAPI exchanges code for tokens, encrypts credentials, saves tocloud_connections, then redirects browser to Vue settings page with?cloud_connected=google_drive(or?cloud_error=…). The auth code and tokens never land in the frontend. - D-04: OAuth state parameter must encode the authenticated user's ID (signed or encrypted) to prevent CSRF on the callback. Use
secrets.token_urlsafe(32)+ a short-lived server-side state store (Redis or DB) to validate the callback matches the initiating user session. - D-05: Access token refresh is on-demand and transparent. When a cloud API call fails with a token-expiry error (HTTP 401 / provider-specific error), the backend catches it, uses the stored refresh token to obtain a new access token, updates
credentials_encin the DB, and retries the original call — all within the same request. The user experiences no interruption. - D-06: If the refresh token itself is rejected by the provider (
invalid_grantor equivalent), the connection status transitions toREQUIRES_REAUTHand the request returns an error telling the user to reconnect. No silent failure.
Nextcloud & WebDAV Credentials
- D-07: The UI presents both auth methods — real account password and app-specific password — with an explanation of trade-offs and a clear recommendation for app password. The backend stores whichever the user provides (both use HTTP Basic Auth). The recommendation text: app passwords can be revoked individually without changing the main account password.
- D-08: On save, the backend validates the WebDAV/Nextcloud connection (a lightweight PROPFIND or OPTIONS request) before storing credentials. If validation fails, return an error — never store unverified credentials.
Storage Selection & Coexistence
- D-09: The sidebar folder tree shows local MinIO folders first, then each connected cloud provider as a peer top-level node (e.g., "Google Drive", "My Nextcloud"). Lazy-load one level at a time: when the user expands a cloud node, the backend fetches the first level of that provider's folder tree via the cloud API.
- D-10: Upload destination follows the active folder context. If the user is viewing a local folder, uploads go to MinIO. If they are viewing a cloud provider folder, uploads go to that cloud provider via FastAPI intermediary (no direct browser-to-cloud upload). The
documents.storage_backendcolumn already exists to record which backend holds each document. - D-11: Existing MinIO documents stay in MinIO — no migration. Local and cloud documents coexist.
document.storage_backend = "minio"for existing docs; new cloud docs getstorage_backend = "google_drive"etc. - D-12: Cloud provider management lives in a new "Cloud Storage" tab in SettingsView. The tab shows: all supported providers; connection status badge (
ACTIVE/REQUIRES_REAUTH/ERROR/ not connected); "Connect" button for unconnected providers; per-connection "Disconnect" button; a "Disconnect all" action. - D-13: Multiple cloud providers can be connected simultaneously (one row per provider in
cloud_connections). Each provider's tree appears as its own top-level node in the sidebar.
Cloud Document Upload
- D-14: For cloud backends, file bytes go through FastAPI first (
POST /api/documents/uploaddetects the target backend from the active folder context), then FastAPI calls the cloud provider API to store them. The presigned-PUT-URL flow (used for MinIO) is not used for cloud backends. Thegenerate_presigned_put_urlmethod on cloudStorageBackendimplementations can raiseNotImplementedError— the upload endpoint detects cloud backends and uses the direct upload path.
Cloud Document Retrieval
- D-15: Document downloads/previews use the same
GET /api/documents/{id}/contentproxy endpoint regardless of storage backend. The endpoint callsstorage_backend.get_object(document.object_key)and streams the bytes to the browser. The frontend does not know or care which backend holds the file. - D-16: Cloud folder tree browsing is live API calls (no DB sync). A 60-second in-memory TTL cache (keyed by
user_id + provider + folder_path) prevents redundant calls when the user collapses and re-expands the same node within one minute. The cache lives in FastAPI application state (orfunctools.lru_cache-equivalent with TTL). Not Redis — in-memory is sufficient for a single-user session pattern.
SSRF Prevention
- D-17: All outbound HTTP calls to WebDAV/Nextcloud use a URL allowlist: the server URL provided by the user must pass hostname validation (not
localhost,127.x,169.254.x, private RFC 1918 ranges, or::1). Validation runs at connect-time and before every request. Implemented in a sharedvalidate_cloud_url()utility — all WebDAV/Nextcloud backends call it before constructing requests.
Security Invariants (carry-forward)
- D-18:
credentials_encis encrypted with HKDF per-user key derivation (HKDF(CLOUD_CREDS_KEY, salt=user_id_bytes, info=b"cloud-credentials")). The master key lives in theCLOUD_CREDS_KEYenv var. Never stored unencrypted. Never returned in any API response. - D-19: Admin API responses for cloud connections return only
provider, display_name, connected_at, status(the existingCloudConnectionOutPydantic whitelist pattern from Phase 4).
Claude's Discretion
- Choice of Python OAuth client library for Google Drive and OneDrive (e.g.,
google-auth-oauthlib,msal) — Claude selects based on PyPI availability and Phase 5 open question in STATE.md ("Verify cloud SDK minor versions on PyPI before Phase 5 pinning"). - Choice of WebDAV Python library (e.g.,
webdavclient3,aiohttpwith manual PROPFIND) — Claude selects based on async compatibility. - Exact TTL cache implementation (dict + timestamp vs.
cachetools.TTLCache) — Claude picks the simplest approach with no new dependency if possible. - OAuth state store implementation (Redis vs. short-lived DB row vs. signed JWT) — Claude selects based on what's already wired in the stack.
<canonical_refs>
Canonical References
Downstream agents MUST read these before planning or implementing.
Storage Backend Contract
backend/storage/base.py—StorageBackendABC: 7 abstract methods that all new cloud backends must implement. Note:generate_presigned_put_urlraisesNotImplementedErrorfor cloud backends (D-14).backend/storage/__init__.py—get_storage_backend()factory: Phase 5 must extend this to resolve the correct backend from the document'sstorage_backendfield and the user's active context.backend/storage/minio_backend.py— Reference implementation ofStorageBackend— patterns forasyncio.to_thread()wrapping and error handling.
Data Model
backend/db/models.py—CloudConnectionmodel (fields:id,user_id,provider,display_name,credentials_enc,status,connected_at). Thecloud_connectionstable already exists from the Phase 1 migration. Also seeDocumentmodel —storage_backendcolumn records which backend holds each document.
Requirements
.planning/REQUIREMENTS.md— CLOUD-01 through CLOUD-07 (the 7 cloud storage requirements for this phase)..planning/ROADMAP.md— Phase 5 goal, success criteria, and phase gates (SSRF test, credential encryption round-trip, admin response never exposingcredentials_enc, OAuthinvalid_granthandling).
Security Protocol
CLAUDE.md§"Key Architectural Rules" — HKDF per-user key derivation pattern, SSRF allowlist requirement,credentials_encnever in API responses.CLAUDE.md§"Security Protocol" — SSRF section: "user-supplied URLs for WebDAV/Nextcloud must pass hostname allowlist".
AI Provider Pattern (structural analog)
backend/ai/base.py—AIProviderABC: Phase 5 cloud backends mirror this pattern (ABC + factory + per-provider file).backend/ai/__init__.py—get_provider()factory pattern to mirror inget_storage_backend()extension.
Frontend Patterns
frontend/src/stores/— Pinia store patterns established in Phases 2–4 (auth store, folders store). Cloud connections store follows same pattern.frontend/src/views/SettingsView.vue— Existing view to extend with "Cloud Storage" tab.frontend/src/components/FolderTreeItem.vue(Phase 4) — Lazy-loading tree component to extend for cloud provider nodes.
</canonical_refs>
<code_context>
Existing Code Insights
Reusable Assets
backend/storage/base.py(StorageBackendABC) — New cloud backends subclass this directly. All 4 abstract methods beyondgenerate_presigned_put_urlmust be implemented.backend/storage/minio_backend.py— Template forasyncio.to_thread()pattern, error handling shape, and constructor signature.backend/db/models.py(CloudConnection) — Table already exists; no new migration needed for the connection model itself. A new Alembic migration may be needed to addstorage_backendcolumn todocumentsif not already present (verify).frontend/src/components/FolderTreeItem.vue— Existing lazy-load tree item; extend to support cloud provider root nodes with a different icon and live-fetch behavior.frontend/src/views/SettingsView.vue— Tab-based layout; add "Cloud Storage" as a new tab following the same pattern as existing tabs.GET /api/documents/{id}/content(Phase 4, Plan 04-05) — PDF proxy endpoint. Phase 5 makes this backend-agnostic by routing throughget_storage_backend()per document.
Established Patterns
- Factory pattern:
get_storage_backend()inbackend/storage/__init__.pymirrorsget_provider()inbackend/ai/__init__.py. Cloud backends extend the factory with astorage_backendparameter (from the document record or upload context). - HKDF encryption: Established for cloud credentials in CLAUDE.md. Same pattern as cloud credentials is already used in the codebase — reuse the derivation utility.
- Pydantic whitelist response models:
CloudConnectionOutpattern from Phase 4 — never exposecredentials_enc. Apply to all new cloud endpoints. asyncio.to_thread(): All sync SDK calls (cloud provider SDKs may be sync) wrapped inasyncio.to_thread()— matches MinIOBackend pattern.- Audit log:
write_audit_log()helper from Phase 4 — call on cloud connect, disconnect, and re-auth events. get_regular_userdep: All cloud connection endpoints useget_regular_user(admin blocked from this surface — CLOUD credentials are personal, not platform-managed).
Integration Points
GET/POST /api/cloud/connections— new endpoint group for connecting, listing, and disconnecting cloud backends.GET /api/cloud/oauth/initiate/{provider}— redirects user to OAuth consent URL.GET /api/cloud/oauth/callback/{provider}— FastAPI OAuth callback; exchanges code, saves credentials, redirects to Vue.GET /api/cloud/folders/{provider}/{folder_id}— lists children of a cloud folder (lazy-load tree).- Upload endpoint (
POST /api/documents/upload) — must detect active folder's backend and route accordingly. GET /api/documents/{id}/content— already proxies bytes; must resolve backend fromdocument.storage_backend.- Sidebar
FolderTreeItem.vue— add cloud provider root nodes below local folder tree.
</code_context>
## Specific Ideas- Sidebar layout: Local folders shown first under a "My Documents" section header; cloud providers below under a "Cloud Storage" section (or just listed as peer top-level nodes with a cloud icon). The visual separation makes it clear which node is local vs. remote.
- Multiple providers: All connected providers appear simultaneously in the sidebar — one node per connection. Disconnecting a provider removes its node from the tree.
- Nextcloud/WebDAV UX copy: The connection modal explains: "App password — can be revoked without changing your main password (recommended). Your account password — simpler to set up, but revocation requires changing your entire account password."
- OAuth callback redirect: On success, Vue reads
?cloud_connected=google_drivequery param in SettingsView'sonMountedand shows a transient success toast. On error, reads?cloud_error=…and shows an error banner. REQUIRES_REAUTHprompt: When a connection has statusREQUIRES_REAUTH, the SettingsView Cloud Storage tab shows a yellow badge and a "Reconnect" button that re-initiates the OAuth flow.
- Document migration between backends — user-initiated move of existing MinIO docs to a cloud provider. Out of scope for Phase 5; no migration is performed.
- Cloud-native resumable upload URLs (provider-specific presigned upload sessions) — skipped in favor of FastAPI intermediary (simpler). Can be added as a performance optimization in a future phase.
- Shared cloud storage (team/organization) — multiple users sharing one cloud backend. Out of scope;
cloud_connectionsis per-user. - Cloud folder sync / offline cache — syncing cloud folder trees to DB for offline browsing. Out of scope; live API + TTL cache is sufficient.
- Email notifications on REQUIRES_REAUTH — out of scope for Phase 5; status is visible in SettingsView.
Phase: 5-Cloud Storage Backends Context gathered: 2026-05-28