docs(05): capture phase 5 context — cloud storage backends
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,148 @@
|
||||
# Phase 5: Cloud Storage Backends - Context
|
||||
|
||||
**Gathered:** 2026-05-28
|
||||
**Status:** Ready for planning
|
||||
|
||||
<domain>
|
||||
## Phase Boundary
|
||||
|
||||
Users can connect OneDrive, Google Drive, Nextcloud, or a generic WebDAV server as a personal storage backend through the DocuVault web UI. Connected cloud providers appear alongside local MinIO folders in the existing sidebar folder tree. Credentials are encrypted per-user via HKDF. Connection status is visible and manageable from a new "Cloud Storage" tab in SettingsView. Local MinIO storage and all connected cloud backends coexist — no document migration. The `StorageBackend` ABC is extended with four new concrete implementations.
|
||||
|
||||
**All 4 providers ship in this phase** — no phased delivery.
|
||||
|
||||
</domain>
|
||||
|
||||
<decisions>
|
||||
## Implementation Decisions
|
||||
|
||||
### Backend Scope
|
||||
- **D-01:** All 4 providers (OneDrive/Microsoft Graph, Google Drive v3, Nextcloud, WebDAV) are delivered in this single phase.
|
||||
- **D-02:** Each provider is a concrete `StorageBackend` subclass in `backend/storage/` (e.g., `google_drive_backend.py`, `onedrive_backend.py`, `nextcloud_backend.py`, `webdav_backend.py`). The existing ABC's 7 abstract methods define the contract.
|
||||
|
||||
### OAuth Flow (Google Drive & OneDrive)
|
||||
- **D-03:** FastAPI owns the OAuth callback. Flow: user clicks "Connect" in SettingsView → redirected to provider's OAuth consent page → provider redirects to `GET /api/cloud/oauth/callback/{provider}?code=…&state=…` → FastAPI exchanges code for tokens, encrypts credentials, saves to `cloud_connections`, then redirects browser to Vue settings page with `?cloud_connected=google_drive` (or `?cloud_error=…`). The auth code and tokens never land in the frontend.
|
||||
- **D-04:** OAuth state parameter must encode the authenticated user's ID (signed or encrypted) to prevent CSRF on the callback. Use `secrets.token_urlsafe(32)` + a short-lived server-side state store (Redis or DB) to validate the callback matches the initiating user session.
|
||||
- **D-05:** Access token refresh is **on-demand and transparent**. When a cloud API call fails with a token-expiry error (HTTP 401 / provider-specific error), the backend catches it, uses the stored refresh token to obtain a new access token, updates `credentials_enc` in the DB, and retries the original call — all within the same request. The user experiences no interruption.
|
||||
- **D-06:** If the refresh token itself is rejected by the provider (`invalid_grant` or equivalent), the connection status transitions to `REQUIRES_REAUTH` and the request returns an error telling the user to reconnect. No silent failure.
|
||||
|
||||
### Nextcloud & WebDAV Credentials
|
||||
- **D-07:** The UI presents both auth methods — real account password and app-specific password — with an explanation of trade-offs and a clear recommendation for app password. The backend stores whichever the user provides (both use HTTP Basic Auth). The recommendation text: app passwords can be revoked individually without changing the main account password.
|
||||
- **D-08:** On save, the backend validates the WebDAV/Nextcloud connection (a lightweight PROPFIND or OPTIONS request) before storing credentials. If validation fails, return an error — never store unverified credentials.
|
||||
|
||||
### Storage Selection & Coexistence
|
||||
- **D-09:** The sidebar folder tree shows local MinIO folders first, then each connected cloud provider as a peer top-level node (e.g., "Google Drive", "My Nextcloud"). Lazy-load one level at a time: when the user expands a cloud node, the backend fetches the first level of that provider's folder tree via the cloud API.
|
||||
- **D-10:** Upload destination follows the **active folder context**. If the user is viewing a local folder, uploads go to MinIO. If they are viewing a cloud provider folder, uploads go to that cloud provider via FastAPI intermediary (no direct browser-to-cloud upload). The `documents.storage_backend` column already exists to record which backend holds each document.
|
||||
- **D-11:** Existing MinIO documents stay in MinIO — no migration. Local and cloud documents coexist. `document.storage_backend = "minio"` for existing docs; new cloud docs get `storage_backend = "google_drive"` etc.
|
||||
- **D-12:** Cloud provider management lives in a new **"Cloud Storage" tab in SettingsView**. The tab shows: all supported providers; connection status badge (`ACTIVE` / `REQUIRES_REAUTH` / `ERROR` / not connected); "Connect" button for unconnected providers; per-connection "Disconnect" button; a "Disconnect all" action.
|
||||
- **D-13:** Multiple cloud providers can be connected simultaneously (one row per provider in `cloud_connections`). Each provider's tree appears as its own top-level node in the sidebar.
|
||||
|
||||
### Cloud Document Upload
|
||||
- **D-14:** For cloud backends, file bytes go through FastAPI first (`POST /api/documents/upload` detects the target backend from the active folder context), then FastAPI calls the cloud provider API to store them. The presigned-PUT-URL flow (used for MinIO) is **not used** for cloud backends. The `generate_presigned_put_url` method on cloud `StorageBackend` implementations can raise `NotImplementedError` — the upload endpoint detects cloud backends and uses the direct upload path.
|
||||
|
||||
### Cloud Document Retrieval
|
||||
- **D-15:** Document downloads/previews use the **same `GET /api/documents/{id}/content` proxy endpoint** regardless of storage backend. The endpoint calls `storage_backend.get_object(document.object_key)` and streams the bytes to the browser. The frontend does not know or care which backend holds the file.
|
||||
- **D-16:** Cloud folder tree browsing is **live API calls** (no DB sync). A **60-second in-memory TTL cache** (keyed by `user_id + provider + folder_path`) prevents redundant calls when the user collapses and re-expands the same node within one minute. The cache lives in FastAPI application state (or `functools.lru_cache`-equivalent with TTL). Not Redis — in-memory is sufficient for a single-user session pattern.
|
||||
|
||||
### SSRF Prevention
|
||||
- **D-17:** All outbound HTTP calls to WebDAV/Nextcloud use a URL allowlist: the server URL provided by the user must pass hostname validation (not `localhost`, `127.x`, `169.254.x`, private RFC 1918 ranges, or `::1`). Validation runs at connect-time and before every request. Implemented in a shared `validate_cloud_url()` utility — all WebDAV/Nextcloud backends call it before constructing requests.
|
||||
|
||||
### Security Invariants (carry-forward)
|
||||
- **D-18:** `credentials_enc` is encrypted with HKDF per-user key derivation (`HKDF(CLOUD_CREDS_KEY, salt=user_id_bytes, info=b"cloud-credentials")`). The master key lives in the `CLOUD_CREDS_KEY` env var. Never stored unencrypted. Never returned in any API response.
|
||||
- **D-19:** Admin API responses for cloud connections return only `provider, display_name, connected_at, status` (the existing `CloudConnectionOut` Pydantic whitelist pattern from Phase 4).
|
||||
|
||||
### Claude's Discretion
|
||||
- Choice of Python OAuth client library for Google Drive and OneDrive (e.g., `google-auth-oauthlib`, `msal`) — Claude selects based on PyPI availability and Phase 5 open question in STATE.md ("Verify cloud SDK minor versions on PyPI before Phase 5 pinning").
|
||||
- Choice of WebDAV Python library (e.g., `webdavclient3`, `aiohttp` with manual PROPFIND) — Claude selects based on async compatibility.
|
||||
- Exact TTL cache implementation (dict + timestamp vs. `cachetools.TTLCache`) — Claude picks the simplest approach with no new dependency if possible.
|
||||
- OAuth state store implementation (Redis vs. short-lived DB row vs. signed JWT) — Claude selects based on what's already wired in the stack.
|
||||
|
||||
</decisions>
|
||||
|
||||
<canonical_refs>
|
||||
## Canonical References
|
||||
|
||||
**Downstream agents MUST read these before planning or implementing.**
|
||||
|
||||
### Storage Backend Contract
|
||||
- `backend/storage/base.py` — `StorageBackend` ABC: 7 abstract methods that all new cloud backends must implement. Note: `generate_presigned_put_url` raises `NotImplementedError` for cloud backends (D-14).
|
||||
- `backend/storage/__init__.py` — `get_storage_backend()` factory: Phase 5 must extend this to resolve the correct backend from the document's `storage_backend` field and the user's active context.
|
||||
- `backend/storage/minio_backend.py` — Reference implementation of `StorageBackend` — patterns for `asyncio.to_thread()` wrapping and error handling.
|
||||
|
||||
### Data Model
|
||||
- `backend/db/models.py` — `CloudConnection` model (fields: `id`, `user_id`, `provider`, `display_name`, `credentials_enc`, `status`, `connected_at`). The `cloud_connections` table already exists from the Phase 1 migration. Also see `Document` model — `storage_backend` column records which backend holds each document.
|
||||
|
||||
### Requirements
|
||||
- `.planning/REQUIREMENTS.md` — CLOUD-01 through CLOUD-07 (the 7 cloud storage requirements for this phase).
|
||||
- `.planning/ROADMAP.md` — Phase 5 goal, success criteria, and phase gates (SSRF test, credential encryption round-trip, admin response never exposing `credentials_enc`, OAuth `invalid_grant` handling).
|
||||
|
||||
### Security Protocol
|
||||
- `CLAUDE.md` §"Key Architectural Rules" — HKDF per-user key derivation pattern, SSRF allowlist requirement, `credentials_enc` never in API responses.
|
||||
- `CLAUDE.md` §"Security Protocol" — SSRF section: "user-supplied URLs for WebDAV/Nextcloud must pass hostname allowlist".
|
||||
|
||||
### AI Provider Pattern (structural analog)
|
||||
- `backend/ai/base.py` — `AIProvider` ABC: Phase 5 cloud backends mirror this pattern (ABC + factory + per-provider file).
|
||||
- `backend/ai/__init__.py` — `get_provider()` factory pattern to mirror in `get_storage_backend()` extension.
|
||||
|
||||
### Frontend Patterns
|
||||
- `frontend/src/stores/` — Pinia store patterns established in Phases 2–4 (auth store, folders store). Cloud connections store follows same pattern.
|
||||
- `frontend/src/views/SettingsView.vue` — Existing view to extend with "Cloud Storage" tab.
|
||||
- `frontend/src/components/FolderTreeItem.vue` (Phase 4) — Lazy-loading tree component to extend for cloud provider nodes.
|
||||
|
||||
</canonical_refs>
|
||||
|
||||
<code_context>
|
||||
## Existing Code Insights
|
||||
|
||||
### Reusable Assets
|
||||
- `backend/storage/base.py` (`StorageBackend` ABC) — New cloud backends subclass this directly. All 4 abstract methods beyond `generate_presigned_put_url` must be implemented.
|
||||
- `backend/storage/minio_backend.py` — Template for `asyncio.to_thread()` pattern, error handling shape, and constructor signature.
|
||||
- `backend/db/models.py` (`CloudConnection`) — Table already exists; no new migration needed for the connection model itself. A new Alembic migration may be needed to add `storage_backend` column to `documents` if not already present (verify).
|
||||
- `frontend/src/components/FolderTreeItem.vue` — Existing lazy-load tree item; extend to support cloud provider root nodes with a different icon and live-fetch behavior.
|
||||
- `frontend/src/views/SettingsView.vue` — Tab-based layout; add "Cloud Storage" as a new tab following the same pattern as existing tabs.
|
||||
- `GET /api/documents/{id}/content` (Phase 4, Plan 04-05) — PDF proxy endpoint. Phase 5 makes this backend-agnostic by routing through `get_storage_backend()` per document.
|
||||
|
||||
### Established Patterns
|
||||
- **Factory pattern:** `get_storage_backend()` in `backend/storage/__init__.py` mirrors `get_provider()` in `backend/ai/__init__.py`. Cloud backends extend the factory with a `storage_backend` parameter (from the document record or upload context).
|
||||
- **HKDF encryption:** Established for cloud credentials in CLAUDE.md. Same pattern as cloud credentials is already used in the codebase — reuse the derivation utility.
|
||||
- **Pydantic whitelist response models:** `CloudConnectionOut` pattern from Phase 4 — never expose `credentials_enc`. Apply to all new cloud endpoints.
|
||||
- **`asyncio.to_thread()`:** All sync SDK calls (cloud provider SDKs may be sync) wrapped in `asyncio.to_thread()` — matches MinIOBackend pattern.
|
||||
- **Audit log:** `write_audit_log()` helper from Phase 4 — call on cloud connect, disconnect, and re-auth events.
|
||||
- **`get_regular_user` dep:** All cloud connection endpoints use `get_regular_user` (admin blocked from this surface — CLOUD credentials are personal, not platform-managed).
|
||||
|
||||
### Integration Points
|
||||
- `GET/POST /api/cloud/connections` — new endpoint group for connecting, listing, and disconnecting cloud backends.
|
||||
- `GET /api/cloud/oauth/initiate/{provider}` — redirects user to OAuth consent URL.
|
||||
- `GET /api/cloud/oauth/callback/{provider}` — FastAPI OAuth callback; exchanges code, saves credentials, redirects to Vue.
|
||||
- `GET /api/cloud/folders/{provider}/{folder_id}` — lists children of a cloud folder (lazy-load tree).
|
||||
- Upload endpoint (`POST /api/documents/upload`) — must detect active folder's backend and route accordingly.
|
||||
- `GET /api/documents/{id}/content` — already proxies bytes; must resolve backend from `document.storage_backend`.
|
||||
- Sidebar `FolderTreeItem.vue` — add cloud provider root nodes below local folder tree.
|
||||
|
||||
</code_context>
|
||||
|
||||
<specifics>
|
||||
## Specific Ideas
|
||||
|
||||
- **Sidebar layout:** Local folders shown first under a "My Documents" section header; cloud providers below under a "Cloud Storage" section (or just listed as peer top-level nodes with a cloud icon). The visual separation makes it clear which node is local vs. remote.
|
||||
- **Multiple providers:** All connected providers appear simultaneously in the sidebar — one node per connection. Disconnecting a provider removes its node from the tree.
|
||||
- **Nextcloud/WebDAV UX copy:** The connection modal explains: "App password — can be revoked without changing your main password (recommended). Your account password — simpler to set up, but revocation requires changing your entire account password."
|
||||
- **OAuth callback redirect:** On success, Vue reads `?cloud_connected=google_drive` query param in SettingsView's `onMounted` and shows a transient success toast. On error, reads `?cloud_error=…` and shows an error banner.
|
||||
- **`REQUIRES_REAUTH` prompt:** When a connection has status `REQUIRES_REAUTH`, the SettingsView Cloud Storage tab shows a yellow badge and a "Reconnect" button that re-initiates the OAuth flow.
|
||||
|
||||
</specifics>
|
||||
|
||||
<deferred>
|
||||
## Deferred Ideas
|
||||
|
||||
- **Document migration between backends** — user-initiated move of existing MinIO docs to a cloud provider. Out of scope for Phase 5; no migration is performed.
|
||||
- **Cloud-native resumable upload URLs** (provider-specific presigned upload sessions) — skipped in favor of FastAPI intermediary (simpler). Can be added as a performance optimization in a future phase.
|
||||
- **Shared cloud storage (team/organization)** — multiple users sharing one cloud backend. Out of scope; `cloud_connections` is per-user.
|
||||
- **Cloud folder sync / offline cache** — syncing cloud folder trees to DB for offline browsing. Out of scope; live API + TTL cache is sufficient.
|
||||
- **Email notifications on REQUIRES_REAUTH** — out of scope for Phase 5; status is visible in SettingsView.
|
||||
|
||||
</deferred>
|
||||
|
||||
---
|
||||
|
||||
*Phase: 5-Cloud Storage Backends*
|
||||
*Context gathered: 2026-05-28*
|
||||
Reference in New Issue
Block a user