Files
2026-05-28 17:52:25 +02:00

15 KiB
Raw Permalink Blame History

Phase 5: Cloud Storage Backends - Context

Gathered: 2026-05-28 Status: Ready for planning

## Phase Boundary

Users can connect OneDrive, Google Drive, Nextcloud, or a generic WebDAV server as a personal storage backend through the DocuVault web UI. Connected cloud providers appear alongside local MinIO folders in the existing sidebar folder tree. Credentials are encrypted per-user via HKDF. Connection status is visible and manageable from a new "Cloud Storage" tab in SettingsView. Local MinIO storage and all connected cloud backends coexist — no document migration. The StorageBackend ABC is extended with four new concrete implementations.

All 4 providers ship in this phase — no phased delivery.

## Implementation Decisions

Backend Scope

  • D-01: All 4 providers (OneDrive/Microsoft Graph, Google Drive v3, Nextcloud, WebDAV) are delivered in this single phase.
  • D-02: Each provider is a concrete StorageBackend subclass in backend/storage/ (e.g., google_drive_backend.py, onedrive_backend.py, nextcloud_backend.py, webdav_backend.py). The existing ABC's 7 abstract methods define the contract.

OAuth Flow (Google Drive & OneDrive)

  • D-03: FastAPI owns the OAuth callback. Flow: user clicks "Connect" in SettingsView → redirected to provider's OAuth consent page → provider redirects to GET /api/cloud/oauth/callback/{provider}?code=…&state=… → FastAPI exchanges code for tokens, encrypts credentials, saves to cloud_connections, then redirects browser to Vue settings page with ?cloud_connected=google_drive (or ?cloud_error=…). The auth code and tokens never land in the frontend.
  • D-04: OAuth state parameter must encode the authenticated user's ID (signed or encrypted) to prevent CSRF on the callback. Use secrets.token_urlsafe(32) + a short-lived server-side state store (Redis or DB) to validate the callback matches the initiating user session.
  • D-05: Access token refresh is on-demand and transparent. When a cloud API call fails with a token-expiry error (HTTP 401 / provider-specific error), the backend catches it, uses the stored refresh token to obtain a new access token, updates credentials_enc in the DB, and retries the original call — all within the same request. The user experiences no interruption.
  • D-06: If the refresh token itself is rejected by the provider (invalid_grant or equivalent), the connection status transitions to REQUIRES_REAUTH and the request returns an error telling the user to reconnect. No silent failure.

Nextcloud & WebDAV Credentials

  • D-07: The UI presents both auth methods — real account password and app-specific password — with an explanation of trade-offs and a clear recommendation for app password. The backend stores whichever the user provides (both use HTTP Basic Auth). The recommendation text: app passwords can be revoked individually without changing the main account password.
  • D-08: On save, the backend validates the WebDAV/Nextcloud connection (a lightweight PROPFIND or OPTIONS request) before storing credentials. If validation fails, return an error — never store unverified credentials.

Storage Selection & Coexistence

  • D-09: The sidebar folder tree shows local MinIO folders first, then each connected cloud provider as a peer top-level node (e.g., "Google Drive", "My Nextcloud"). Lazy-load one level at a time: when the user expands a cloud node, the backend fetches the first level of that provider's folder tree via the cloud API.
  • D-10: Upload destination follows the active folder context. If the user is viewing a local folder, uploads go to MinIO. If they are viewing a cloud provider folder, uploads go to that cloud provider via FastAPI intermediary (no direct browser-to-cloud upload). The documents.storage_backend column already exists to record which backend holds each document.
  • D-11: Existing MinIO documents stay in MinIO — no migration. Local and cloud documents coexist. document.storage_backend = "minio" for existing docs; new cloud docs get storage_backend = "google_drive" etc.
  • D-12: Cloud provider management lives in a new "Cloud Storage" tab in SettingsView. The tab shows: all supported providers; connection status badge (ACTIVE / REQUIRES_REAUTH / ERROR / not connected); "Connect" button for unconnected providers; per-connection "Disconnect" button; a "Disconnect all" action.
  • D-13: Multiple cloud providers can be connected simultaneously (one row per provider in cloud_connections). Each provider's tree appears as its own top-level node in the sidebar.

Cloud Document Upload

  • D-14: For cloud backends, file bytes go through FastAPI first (POST /api/documents/upload detects the target backend from the active folder context), then FastAPI calls the cloud provider API to store them. The presigned-PUT-URL flow (used for MinIO) is not used for cloud backends. The generate_presigned_put_url method on cloud StorageBackend implementations can raise NotImplementedError — the upload endpoint detects cloud backends and uses the direct upload path.

Cloud Document Retrieval

  • D-15: Document downloads/previews use the same GET /api/documents/{id}/content proxy endpoint regardless of storage backend. The endpoint calls storage_backend.get_object(document.object_key) and streams the bytes to the browser. The frontend does not know or care which backend holds the file.
  • D-16: Cloud folder tree browsing is live API calls (no DB sync). A 60-second in-memory TTL cache (keyed by user_id + provider + folder_path) prevents redundant calls when the user collapses and re-expands the same node within one minute. The cache lives in FastAPI application state (or functools.lru_cache-equivalent with TTL). Not Redis — in-memory is sufficient for a single-user session pattern.

SSRF Prevention

  • D-17: All outbound HTTP calls to WebDAV/Nextcloud use a URL allowlist: the server URL provided by the user must pass hostname validation (not localhost, 127.x, 169.254.x, private RFC 1918 ranges, or ::1). Validation runs at connect-time and before every request. Implemented in a shared validate_cloud_url() utility — all WebDAV/Nextcloud backends call it before constructing requests.

Security Invariants (carry-forward)

  • D-18: credentials_enc is encrypted with HKDF per-user key derivation (HKDF(CLOUD_CREDS_KEY, salt=user_id_bytes, info=b"cloud-credentials")). The master key lives in the CLOUD_CREDS_KEY env var. Never stored unencrypted. Never returned in any API response.
  • D-19: Admin API responses for cloud connections return only provider, display_name, connected_at, status (the existing CloudConnectionOut Pydantic whitelist pattern from Phase 4).

Claude's Discretion

  • Choice of Python OAuth client library for Google Drive and OneDrive (e.g., google-auth-oauthlib, msal) — Claude selects based on PyPI availability and Phase 5 open question in STATE.md ("Verify cloud SDK minor versions on PyPI before Phase 5 pinning").
  • Choice of WebDAV Python library (e.g., webdavclient3, aiohttp with manual PROPFIND) — Claude selects based on async compatibility.
  • Exact TTL cache implementation (dict + timestamp vs. cachetools.TTLCache) — Claude picks the simplest approach with no new dependency if possible.
  • OAuth state store implementation (Redis vs. short-lived DB row vs. signed JWT) — Claude selects based on what's already wired in the stack.

<canonical_refs>

Canonical References

Downstream agents MUST read these before planning or implementing.

Storage Backend Contract

  • backend/storage/base.pyStorageBackend ABC: 7 abstract methods that all new cloud backends must implement. Note: generate_presigned_put_url raises NotImplementedError for cloud backends (D-14).
  • backend/storage/__init__.pyget_storage_backend() factory: Phase 5 must extend this to resolve the correct backend from the document's storage_backend field and the user's active context.
  • backend/storage/minio_backend.py — Reference implementation of StorageBackend — patterns for asyncio.to_thread() wrapping and error handling.

Data Model

  • backend/db/models.pyCloudConnection model (fields: id, user_id, provider, display_name, credentials_enc, status, connected_at). The cloud_connections table already exists from the Phase 1 migration. Also see Document model — storage_backend column records which backend holds each document.

Requirements

  • .planning/REQUIREMENTS.md — CLOUD-01 through CLOUD-07 (the 7 cloud storage requirements for this phase).
  • .planning/ROADMAP.md — Phase 5 goal, success criteria, and phase gates (SSRF test, credential encryption round-trip, admin response never exposing credentials_enc, OAuth invalid_grant handling).

Security Protocol

  • CLAUDE.md §"Key Architectural Rules" — HKDF per-user key derivation pattern, SSRF allowlist requirement, credentials_enc never in API responses.
  • CLAUDE.md §"Security Protocol" — SSRF section: "user-supplied URLs for WebDAV/Nextcloud must pass hostname allowlist".

AI Provider Pattern (structural analog)

  • backend/ai/base.pyAIProvider ABC: Phase 5 cloud backends mirror this pattern (ABC + factory + per-provider file).
  • backend/ai/__init__.pyget_provider() factory pattern to mirror in get_storage_backend() extension.

Frontend Patterns

  • frontend/src/stores/ — Pinia store patterns established in Phases 24 (auth store, folders store). Cloud connections store follows same pattern.
  • frontend/src/views/SettingsView.vue — Existing view to extend with "Cloud Storage" tab.
  • frontend/src/components/FolderTreeItem.vue (Phase 4) — Lazy-loading tree component to extend for cloud provider nodes.

</canonical_refs>

<code_context>

Existing Code Insights

Reusable Assets

  • backend/storage/base.py (StorageBackend ABC) — New cloud backends subclass this directly. All 4 abstract methods beyond generate_presigned_put_url must be implemented.
  • backend/storage/minio_backend.py — Template for asyncio.to_thread() pattern, error handling shape, and constructor signature.
  • backend/db/models.py (CloudConnection) — Table already exists; no new migration needed for the connection model itself. A new Alembic migration may be needed to add storage_backend column to documents if not already present (verify).
  • frontend/src/components/FolderTreeItem.vue — Existing lazy-load tree item; extend to support cloud provider root nodes with a different icon and live-fetch behavior.
  • frontend/src/views/SettingsView.vue — Tab-based layout; add "Cloud Storage" as a new tab following the same pattern as existing tabs.
  • GET /api/documents/{id}/content (Phase 4, Plan 04-05) — PDF proxy endpoint. Phase 5 makes this backend-agnostic by routing through get_storage_backend() per document.

Established Patterns

  • Factory pattern: get_storage_backend() in backend/storage/__init__.py mirrors get_provider() in backend/ai/__init__.py. Cloud backends extend the factory with a storage_backend parameter (from the document record or upload context).
  • HKDF encryption: Established for cloud credentials in CLAUDE.md. Same pattern as cloud credentials is already used in the codebase — reuse the derivation utility.
  • Pydantic whitelist response models: CloudConnectionOut pattern from Phase 4 — never expose credentials_enc. Apply to all new cloud endpoints.
  • asyncio.to_thread(): All sync SDK calls (cloud provider SDKs may be sync) wrapped in asyncio.to_thread() — matches MinIOBackend pattern.
  • Audit log: write_audit_log() helper from Phase 4 — call on cloud connect, disconnect, and re-auth events.
  • get_regular_user dep: All cloud connection endpoints use get_regular_user (admin blocked from this surface — CLOUD credentials are personal, not platform-managed).

Integration Points

  • GET/POST /api/cloud/connections — new endpoint group for connecting, listing, and disconnecting cloud backends.
  • GET /api/cloud/oauth/initiate/{provider} — redirects user to OAuth consent URL.
  • GET /api/cloud/oauth/callback/{provider} — FastAPI OAuth callback; exchanges code, saves credentials, redirects to Vue.
  • GET /api/cloud/folders/{provider}/{folder_id} — lists children of a cloud folder (lazy-load tree).
  • Upload endpoint (POST /api/documents/upload) — must detect active folder's backend and route accordingly.
  • GET /api/documents/{id}/content — already proxies bytes; must resolve backend from document.storage_backend.
  • Sidebar FolderTreeItem.vue — add cloud provider root nodes below local folder tree.

</code_context>

## Specific Ideas
  • Sidebar layout: Local folders shown first under a "My Documents" section header; cloud providers below under a "Cloud Storage" section (or just listed as peer top-level nodes with a cloud icon). The visual separation makes it clear which node is local vs. remote.
  • Multiple providers: All connected providers appear simultaneously in the sidebar — one node per connection. Disconnecting a provider removes its node from the tree.
  • Nextcloud/WebDAV UX copy: The connection modal explains: "App password — can be revoked without changing your main password (recommended). Your account password — simpler to set up, but revocation requires changing your entire account password."
  • OAuth callback redirect: On success, Vue reads ?cloud_connected=google_drive query param in SettingsView's onMounted and shows a transient success toast. On error, reads ?cloud_error=… and shows an error banner.
  • REQUIRES_REAUTH prompt: When a connection has status REQUIRES_REAUTH, the SettingsView Cloud Storage tab shows a yellow badge and a "Reconnect" button that re-initiates the OAuth flow.
## Deferred Ideas
  • Document migration between backends — user-initiated move of existing MinIO docs to a cloud provider. Out of scope for Phase 5; no migration is performed.
  • Cloud-native resumable upload URLs (provider-specific presigned upload sessions) — skipped in favor of FastAPI intermediary (simpler). Can be added as a performance optimization in a future phase.
  • Shared cloud storage (team/organization) — multiple users sharing one cloud backend. Out of scope; cloud_connections is per-user.
  • Cloud folder sync / offline cache — syncing cloud folder trees to DB for offline browsing. Out of scope; live API + TTL cache is sufficient.
  • Email notifications on REQUIRES_REAUTH — out of scope for Phase 5; status is visible in SettingsView.

Phase: 5-Cloud Storage Backends Context gathered: 2026-05-28