Files
2026-05-28 17:52:25 +02:00

149 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 5: Cloud Storage Backends - Context
**Gathered:** 2026-05-28
**Status:** Ready for planning
<domain>
## Phase Boundary
Users can connect OneDrive, Google Drive, Nextcloud, or a generic WebDAV server as a personal storage backend through the DocuVault web UI. Connected cloud providers appear alongside local MinIO folders in the existing sidebar folder tree. Credentials are encrypted per-user via HKDF. Connection status is visible and manageable from a new "Cloud Storage" tab in SettingsView. Local MinIO storage and all connected cloud backends coexist — no document migration. The `StorageBackend` ABC is extended with four new concrete implementations.
**All 4 providers ship in this phase** — no phased delivery.
</domain>
<decisions>
## Implementation Decisions
### Backend Scope
- **D-01:** All 4 providers (OneDrive/Microsoft Graph, Google Drive v3, Nextcloud, WebDAV) are delivered in this single phase.
- **D-02:** Each provider is a concrete `StorageBackend` subclass in `backend/storage/` (e.g., `google_drive_backend.py`, `onedrive_backend.py`, `nextcloud_backend.py`, `webdav_backend.py`). The existing ABC's 7 abstract methods define the contract.
### OAuth Flow (Google Drive & OneDrive)
- **D-03:** FastAPI owns the OAuth callback. Flow: user clicks "Connect" in SettingsView → redirected to provider's OAuth consent page → provider redirects to `GET /api/cloud/oauth/callback/{provider}?code=…&state=…` → FastAPI exchanges code for tokens, encrypts credentials, saves to `cloud_connections`, then redirects browser to Vue settings page with `?cloud_connected=google_drive` (or `?cloud_error=…`). The auth code and tokens never land in the frontend.
- **D-04:** OAuth state parameter must encode the authenticated user's ID (signed or encrypted) to prevent CSRF on the callback. Use `secrets.token_urlsafe(32)` + a short-lived server-side state store (Redis or DB) to validate the callback matches the initiating user session.
- **D-05:** Access token refresh is **on-demand and transparent**. When a cloud API call fails with a token-expiry error (HTTP 401 / provider-specific error), the backend catches it, uses the stored refresh token to obtain a new access token, updates `credentials_enc` in the DB, and retries the original call — all within the same request. The user experiences no interruption.
- **D-06:** If the refresh token itself is rejected by the provider (`invalid_grant` or equivalent), the connection status transitions to `REQUIRES_REAUTH` and the request returns an error telling the user to reconnect. No silent failure.
### Nextcloud & WebDAV Credentials
- **D-07:** The UI presents both auth methods — real account password and app-specific password — with an explanation of trade-offs and a clear recommendation for app password. The backend stores whichever the user provides (both use HTTP Basic Auth). The recommendation text: app passwords can be revoked individually without changing the main account password.
- **D-08:** On save, the backend validates the WebDAV/Nextcloud connection (a lightweight PROPFIND or OPTIONS request) before storing credentials. If validation fails, return an error — never store unverified credentials.
### Storage Selection & Coexistence
- **D-09:** The sidebar folder tree shows local MinIO folders first, then each connected cloud provider as a peer top-level node (e.g., "Google Drive", "My Nextcloud"). Lazy-load one level at a time: when the user expands a cloud node, the backend fetches the first level of that provider's folder tree via the cloud API.
- **D-10:** Upload destination follows the **active folder context**. If the user is viewing a local folder, uploads go to MinIO. If they are viewing a cloud provider folder, uploads go to that cloud provider via FastAPI intermediary (no direct browser-to-cloud upload). The `documents.storage_backend` column already exists to record which backend holds each document.
- **D-11:** Existing MinIO documents stay in MinIO — no migration. Local and cloud documents coexist. `document.storage_backend = "minio"` for existing docs; new cloud docs get `storage_backend = "google_drive"` etc.
- **D-12:** Cloud provider management lives in a new **"Cloud Storage" tab in SettingsView**. The tab shows: all supported providers; connection status badge (`ACTIVE` / `REQUIRES_REAUTH` / `ERROR` / not connected); "Connect" button for unconnected providers; per-connection "Disconnect" button; a "Disconnect all" action.
- **D-13:** Multiple cloud providers can be connected simultaneously (one row per provider in `cloud_connections`). Each provider's tree appears as its own top-level node in the sidebar.
### Cloud Document Upload
- **D-14:** For cloud backends, file bytes go through FastAPI first (`POST /api/documents/upload` detects the target backend from the active folder context), then FastAPI calls the cloud provider API to store them. The presigned-PUT-URL flow (used for MinIO) is **not used** for cloud backends. The `generate_presigned_put_url` method on cloud `StorageBackend` implementations can raise `NotImplementedError` — the upload endpoint detects cloud backends and uses the direct upload path.
### Cloud Document Retrieval
- **D-15:** Document downloads/previews use the **same `GET /api/documents/{id}/content` proxy endpoint** regardless of storage backend. The endpoint calls `storage_backend.get_object(document.object_key)` and streams the bytes to the browser. The frontend does not know or care which backend holds the file.
- **D-16:** Cloud folder tree browsing is **live API calls** (no DB sync). A **60-second in-memory TTL cache** (keyed by `user_id + provider + folder_path`) prevents redundant calls when the user collapses and re-expands the same node within one minute. The cache lives in FastAPI application state (or `functools.lru_cache`-equivalent with TTL). Not Redis — in-memory is sufficient for a single-user session pattern.
### SSRF Prevention
- **D-17:** All outbound HTTP calls to WebDAV/Nextcloud use a URL allowlist: the server URL provided by the user must pass hostname validation (not `localhost`, `127.x`, `169.254.x`, private RFC 1918 ranges, or `::1`). Validation runs at connect-time and before every request. Implemented in a shared `validate_cloud_url()` utility — all WebDAV/Nextcloud backends call it before constructing requests.
### Security Invariants (carry-forward)
- **D-18:** `credentials_enc` is encrypted with HKDF per-user key derivation (`HKDF(CLOUD_CREDS_KEY, salt=user_id_bytes, info=b"cloud-credentials")`). The master key lives in the `CLOUD_CREDS_KEY` env var. Never stored unencrypted. Never returned in any API response.
- **D-19:** Admin API responses for cloud connections return only `provider, display_name, connected_at, status` (the existing `CloudConnectionOut` Pydantic whitelist pattern from Phase 4).
### Claude's Discretion
- Choice of Python OAuth client library for Google Drive and OneDrive (e.g., `google-auth-oauthlib`, `msal`) — Claude selects based on PyPI availability and Phase 5 open question in STATE.md ("Verify cloud SDK minor versions on PyPI before Phase 5 pinning").
- Choice of WebDAV Python library (e.g., `webdavclient3`, `aiohttp` with manual PROPFIND) — Claude selects based on async compatibility.
- Exact TTL cache implementation (dict + timestamp vs. `cachetools.TTLCache`) — Claude picks the simplest approach with no new dependency if possible.
- OAuth state store implementation (Redis vs. short-lived DB row vs. signed JWT) — Claude selects based on what's already wired in the stack.
</decisions>
<canonical_refs>
## Canonical References
**Downstream agents MUST read these before planning or implementing.**
### Storage Backend Contract
- `backend/storage/base.py``StorageBackend` ABC: 7 abstract methods that all new cloud backends must implement. Note: `generate_presigned_put_url` raises `NotImplementedError` for cloud backends (D-14).
- `backend/storage/__init__.py``get_storage_backend()` factory: Phase 5 must extend this to resolve the correct backend from the document's `storage_backend` field and the user's active context.
- `backend/storage/minio_backend.py` — Reference implementation of `StorageBackend` — patterns for `asyncio.to_thread()` wrapping and error handling.
### Data Model
- `backend/db/models.py``CloudConnection` model (fields: `id`, `user_id`, `provider`, `display_name`, `credentials_enc`, `status`, `connected_at`). The `cloud_connections` table already exists from the Phase 1 migration. Also see `Document` model — `storage_backend` column records which backend holds each document.
### Requirements
- `.planning/REQUIREMENTS.md` — CLOUD-01 through CLOUD-07 (the 7 cloud storage requirements for this phase).
- `.planning/ROADMAP.md` — Phase 5 goal, success criteria, and phase gates (SSRF test, credential encryption round-trip, admin response never exposing `credentials_enc`, OAuth `invalid_grant` handling).
### Security Protocol
- `CLAUDE.md` §"Key Architectural Rules" — HKDF per-user key derivation pattern, SSRF allowlist requirement, `credentials_enc` never in API responses.
- `CLAUDE.md` §"Security Protocol" — SSRF section: "user-supplied URLs for WebDAV/Nextcloud must pass hostname allowlist".
### AI Provider Pattern (structural analog)
- `backend/ai/base.py``AIProvider` ABC: Phase 5 cloud backends mirror this pattern (ABC + factory + per-provider file).
- `backend/ai/__init__.py``get_provider()` factory pattern to mirror in `get_storage_backend()` extension.
### Frontend Patterns
- `frontend/src/stores/` — Pinia store patterns established in Phases 24 (auth store, folders store). Cloud connections store follows same pattern.
- `frontend/src/views/SettingsView.vue` — Existing view to extend with "Cloud Storage" tab.
- `frontend/src/components/FolderTreeItem.vue` (Phase 4) — Lazy-loading tree component to extend for cloud provider nodes.
</canonical_refs>
<code_context>
## Existing Code Insights
### Reusable Assets
- `backend/storage/base.py` (`StorageBackend` ABC) — New cloud backends subclass this directly. All 4 abstract methods beyond `generate_presigned_put_url` must be implemented.
- `backend/storage/minio_backend.py` — Template for `asyncio.to_thread()` pattern, error handling shape, and constructor signature.
- `backend/db/models.py` (`CloudConnection`) — Table already exists; no new migration needed for the connection model itself. A new Alembic migration may be needed to add `storage_backend` column to `documents` if not already present (verify).
- `frontend/src/components/FolderTreeItem.vue` — Existing lazy-load tree item; extend to support cloud provider root nodes with a different icon and live-fetch behavior.
- `frontend/src/views/SettingsView.vue` — Tab-based layout; add "Cloud Storage" as a new tab following the same pattern as existing tabs.
- `GET /api/documents/{id}/content` (Phase 4, Plan 04-05) — PDF proxy endpoint. Phase 5 makes this backend-agnostic by routing through `get_storage_backend()` per document.
### Established Patterns
- **Factory pattern:** `get_storage_backend()` in `backend/storage/__init__.py` mirrors `get_provider()` in `backend/ai/__init__.py`. Cloud backends extend the factory with a `storage_backend` parameter (from the document record or upload context).
- **HKDF encryption:** Established for cloud credentials in CLAUDE.md. Same pattern as cloud credentials is already used in the codebase — reuse the derivation utility.
- **Pydantic whitelist response models:** `CloudConnectionOut` pattern from Phase 4 — never expose `credentials_enc`. Apply to all new cloud endpoints.
- **`asyncio.to_thread()`:** All sync SDK calls (cloud provider SDKs may be sync) wrapped in `asyncio.to_thread()` — matches MinIOBackend pattern.
- **Audit log:** `write_audit_log()` helper from Phase 4 — call on cloud connect, disconnect, and re-auth events.
- **`get_regular_user` dep:** All cloud connection endpoints use `get_regular_user` (admin blocked from this surface — CLOUD credentials are personal, not platform-managed).
### Integration Points
- `GET/POST /api/cloud/connections` — new endpoint group for connecting, listing, and disconnecting cloud backends.
- `GET /api/cloud/oauth/initiate/{provider}` — redirects user to OAuth consent URL.
- `GET /api/cloud/oauth/callback/{provider}` — FastAPI OAuth callback; exchanges code, saves credentials, redirects to Vue.
- `GET /api/cloud/folders/{provider}/{folder_id}` — lists children of a cloud folder (lazy-load tree).
- Upload endpoint (`POST /api/documents/upload`) — must detect active folder's backend and route accordingly.
- `GET /api/documents/{id}/content` — already proxies bytes; must resolve backend from `document.storage_backend`.
- Sidebar `FolderTreeItem.vue` — add cloud provider root nodes below local folder tree.
</code_context>
<specifics>
## Specific Ideas
- **Sidebar layout:** Local folders shown first under a "My Documents" section header; cloud providers below under a "Cloud Storage" section (or just listed as peer top-level nodes with a cloud icon). The visual separation makes it clear which node is local vs. remote.
- **Multiple providers:** All connected providers appear simultaneously in the sidebar — one node per connection. Disconnecting a provider removes its node from the tree.
- **Nextcloud/WebDAV UX copy:** The connection modal explains: "App password — can be revoked without changing your main password (recommended). Your account password — simpler to set up, but revocation requires changing your entire account password."
- **OAuth callback redirect:** On success, Vue reads `?cloud_connected=google_drive` query param in SettingsView's `onMounted` and shows a transient success toast. On error, reads `?cloud_error=…` and shows an error banner.
- **`REQUIRES_REAUTH` prompt:** When a connection has status `REQUIRES_REAUTH`, the SettingsView Cloud Storage tab shows a yellow badge and a "Reconnect" button that re-initiates the OAuth flow.
</specifics>
<deferred>
## Deferred Ideas
- **Document migration between backends** — user-initiated move of existing MinIO docs to a cloud provider. Out of scope for Phase 5; no migration is performed.
- **Cloud-native resumable upload URLs** (provider-specific presigned upload sessions) — skipped in favor of FastAPI intermediary (simpler). Can be added as a performance optimization in a future phase.
- **Shared cloud storage (team/organization)** — multiple users sharing one cloud backend. Out of scope; `cloud_connections` is per-user.
- **Cloud folder sync / offline cache** — syncing cloud folder trees to DB for offline browsing. Out of scope; live API + TTL cache is sufficient.
- **Email notifications on REQUIRES_REAUTH** — out of scope for Phase 5; status is visible in SettingsView.
</deferred>
---
*Phase: 5-Cloud Storage Backends*
*Context gathered: 2026-05-28*