diff --git a/.planning/phases/05-cloud-storage-backends/05-CONTEXT.md b/.planning/phases/05-cloud-storage-backends/05-CONTEXT.md new file mode 100644 index 0000000..e45e741 --- /dev/null +++ b/.planning/phases/05-cloud-storage-backends/05-CONTEXT.md @@ -0,0 +1,148 @@ +# Phase 5: Cloud Storage Backends - Context + +**Gathered:** 2026-05-28 +**Status:** Ready for planning + + +## Phase Boundary + +Users can connect OneDrive, Google Drive, Nextcloud, or a generic WebDAV server as a personal storage backend through the DocuVault web UI. Connected cloud providers appear alongside local MinIO folders in the existing sidebar folder tree. Credentials are encrypted per-user via HKDF. Connection status is visible and manageable from a new "Cloud Storage" tab in SettingsView. Local MinIO storage and all connected cloud backends coexist — no document migration. The `StorageBackend` ABC is extended with four new concrete implementations. + +**All 4 providers ship in this phase** — no phased delivery. + + + + +## Implementation Decisions + +### Backend Scope +- **D-01:** All 4 providers (OneDrive/Microsoft Graph, Google Drive v3, Nextcloud, WebDAV) are delivered in this single phase. +- **D-02:** Each provider is a concrete `StorageBackend` subclass in `backend/storage/` (e.g., `google_drive_backend.py`, `onedrive_backend.py`, `nextcloud_backend.py`, `webdav_backend.py`). The existing ABC's 7 abstract methods define the contract. + +### OAuth Flow (Google Drive & OneDrive) +- **D-03:** FastAPI owns the OAuth callback. Flow: user clicks "Connect" in SettingsView → redirected to provider's OAuth consent page → provider redirects to `GET /api/cloud/oauth/callback/{provider}?code=…&state=…` → FastAPI exchanges code for tokens, encrypts credentials, saves to `cloud_connections`, then redirects browser to Vue settings page with `?cloud_connected=google_drive` (or `?cloud_error=…`). The auth code and tokens never land in the frontend. +- **D-04:** OAuth state parameter must encode the authenticated user's ID (signed or encrypted) to prevent CSRF on the callback. Use `secrets.token_urlsafe(32)` + a short-lived server-side state store (Redis or DB) to validate the callback matches the initiating user session. +- **D-05:** Access token refresh is **on-demand and transparent**. When a cloud API call fails with a token-expiry error (HTTP 401 / provider-specific error), the backend catches it, uses the stored refresh token to obtain a new access token, updates `credentials_enc` in the DB, and retries the original call — all within the same request. The user experiences no interruption. +- **D-06:** If the refresh token itself is rejected by the provider (`invalid_grant` or equivalent), the connection status transitions to `REQUIRES_REAUTH` and the request returns an error telling the user to reconnect. No silent failure. + +### Nextcloud & WebDAV Credentials +- **D-07:** The UI presents both auth methods — real account password and app-specific password — with an explanation of trade-offs and a clear recommendation for app password. The backend stores whichever the user provides (both use HTTP Basic Auth). The recommendation text: app passwords can be revoked individually without changing the main account password. +- **D-08:** On save, the backend validates the WebDAV/Nextcloud connection (a lightweight PROPFIND or OPTIONS request) before storing credentials. If validation fails, return an error — never store unverified credentials. + +### Storage Selection & Coexistence +- **D-09:** The sidebar folder tree shows local MinIO folders first, then each connected cloud provider as a peer top-level node (e.g., "Google Drive", "My Nextcloud"). Lazy-load one level at a time: when the user expands a cloud node, the backend fetches the first level of that provider's folder tree via the cloud API. +- **D-10:** Upload destination follows the **active folder context**. If the user is viewing a local folder, uploads go to MinIO. If they are viewing a cloud provider folder, uploads go to that cloud provider via FastAPI intermediary (no direct browser-to-cloud upload). The `documents.storage_backend` column already exists to record which backend holds each document. +- **D-11:** Existing MinIO documents stay in MinIO — no migration. Local and cloud documents coexist. `document.storage_backend = "minio"` for existing docs; new cloud docs get `storage_backend = "google_drive"` etc. +- **D-12:** Cloud provider management lives in a new **"Cloud Storage" tab in SettingsView**. The tab shows: all supported providers; connection status badge (`ACTIVE` / `REQUIRES_REAUTH` / `ERROR` / not connected); "Connect" button for unconnected providers; per-connection "Disconnect" button; a "Disconnect all" action. +- **D-13:** Multiple cloud providers can be connected simultaneously (one row per provider in `cloud_connections`). Each provider's tree appears as its own top-level node in the sidebar. + +### Cloud Document Upload +- **D-14:** For cloud backends, file bytes go through FastAPI first (`POST /api/documents/upload` detects the target backend from the active folder context), then FastAPI calls the cloud provider API to store them. The presigned-PUT-URL flow (used for MinIO) is **not used** for cloud backends. The `generate_presigned_put_url` method on cloud `StorageBackend` implementations can raise `NotImplementedError` — the upload endpoint detects cloud backends and uses the direct upload path. + +### Cloud Document Retrieval +- **D-15:** Document downloads/previews use the **same `GET /api/documents/{id}/content` proxy endpoint** regardless of storage backend. The endpoint calls `storage_backend.get_object(document.object_key)` and streams the bytes to the browser. The frontend does not know or care which backend holds the file. +- **D-16:** Cloud folder tree browsing is **live API calls** (no DB sync). A **60-second in-memory TTL cache** (keyed by `user_id + provider + folder_path`) prevents redundant calls when the user collapses and re-expands the same node within one minute. The cache lives in FastAPI application state (or `functools.lru_cache`-equivalent with TTL). Not Redis — in-memory is sufficient for a single-user session pattern. + +### SSRF Prevention +- **D-17:** All outbound HTTP calls to WebDAV/Nextcloud use a URL allowlist: the server URL provided by the user must pass hostname validation (not `localhost`, `127.x`, `169.254.x`, private RFC 1918 ranges, or `::1`). Validation runs at connect-time and before every request. Implemented in a shared `validate_cloud_url()` utility — all WebDAV/Nextcloud backends call it before constructing requests. + +### Security Invariants (carry-forward) +- **D-18:** `credentials_enc` is encrypted with HKDF per-user key derivation (`HKDF(CLOUD_CREDS_KEY, salt=user_id_bytes, info=b"cloud-credentials")`). The master key lives in the `CLOUD_CREDS_KEY` env var. Never stored unencrypted. Never returned in any API response. +- **D-19:** Admin API responses for cloud connections return only `provider, display_name, connected_at, status` (the existing `CloudConnectionOut` Pydantic whitelist pattern from Phase 4). + +### Claude's Discretion +- Choice of Python OAuth client library for Google Drive and OneDrive (e.g., `google-auth-oauthlib`, `msal`) — Claude selects based on PyPI availability and Phase 5 open question in STATE.md ("Verify cloud SDK minor versions on PyPI before Phase 5 pinning"). +- Choice of WebDAV Python library (e.g., `webdavclient3`, `aiohttp` with manual PROPFIND) — Claude selects based on async compatibility. +- Exact TTL cache implementation (dict + timestamp vs. `cachetools.TTLCache`) — Claude picks the simplest approach with no new dependency if possible. +- OAuth state store implementation (Redis vs. short-lived DB row vs. signed JWT) — Claude selects based on what's already wired in the stack. + + + + +## Canonical References + +**Downstream agents MUST read these before planning or implementing.** + +### Storage Backend Contract +- `backend/storage/base.py` — `StorageBackend` ABC: 7 abstract methods that all new cloud backends must implement. Note: `generate_presigned_put_url` raises `NotImplementedError` for cloud backends (D-14). +- `backend/storage/__init__.py` — `get_storage_backend()` factory: Phase 5 must extend this to resolve the correct backend from the document's `storage_backend` field and the user's active context. +- `backend/storage/minio_backend.py` — Reference implementation of `StorageBackend` — patterns for `asyncio.to_thread()` wrapping and error handling. + +### Data Model +- `backend/db/models.py` — `CloudConnection` model (fields: `id`, `user_id`, `provider`, `display_name`, `credentials_enc`, `status`, `connected_at`). The `cloud_connections` table already exists from the Phase 1 migration. Also see `Document` model — `storage_backend` column records which backend holds each document. + +### Requirements +- `.planning/REQUIREMENTS.md` — CLOUD-01 through CLOUD-07 (the 7 cloud storage requirements for this phase). +- `.planning/ROADMAP.md` — Phase 5 goal, success criteria, and phase gates (SSRF test, credential encryption round-trip, admin response never exposing `credentials_enc`, OAuth `invalid_grant` handling). + +### Security Protocol +- `CLAUDE.md` §"Key Architectural Rules" — HKDF per-user key derivation pattern, SSRF allowlist requirement, `credentials_enc` never in API responses. +- `CLAUDE.md` §"Security Protocol" — SSRF section: "user-supplied URLs for WebDAV/Nextcloud must pass hostname allowlist". + +### AI Provider Pattern (structural analog) +- `backend/ai/base.py` — `AIProvider` ABC: Phase 5 cloud backends mirror this pattern (ABC + factory + per-provider file). +- `backend/ai/__init__.py` — `get_provider()` factory pattern to mirror in `get_storage_backend()` extension. + +### Frontend Patterns +- `frontend/src/stores/` — Pinia store patterns established in Phases 2–4 (auth store, folders store). Cloud connections store follows same pattern. +- `frontend/src/views/SettingsView.vue` — Existing view to extend with "Cloud Storage" tab. +- `frontend/src/components/FolderTreeItem.vue` (Phase 4) — Lazy-loading tree component to extend for cloud provider nodes. + + + + +## Existing Code Insights + +### Reusable Assets +- `backend/storage/base.py` (`StorageBackend` ABC) — New cloud backends subclass this directly. All 4 abstract methods beyond `generate_presigned_put_url` must be implemented. +- `backend/storage/minio_backend.py` — Template for `asyncio.to_thread()` pattern, error handling shape, and constructor signature. +- `backend/db/models.py` (`CloudConnection`) — Table already exists; no new migration needed for the connection model itself. A new Alembic migration may be needed to add `storage_backend` column to `documents` if not already present (verify). +- `frontend/src/components/FolderTreeItem.vue` — Existing lazy-load tree item; extend to support cloud provider root nodes with a different icon and live-fetch behavior. +- `frontend/src/views/SettingsView.vue` — Tab-based layout; add "Cloud Storage" as a new tab following the same pattern as existing tabs. +- `GET /api/documents/{id}/content` (Phase 4, Plan 04-05) — PDF proxy endpoint. Phase 5 makes this backend-agnostic by routing through `get_storage_backend()` per document. + +### Established Patterns +- **Factory pattern:** `get_storage_backend()` in `backend/storage/__init__.py` mirrors `get_provider()` in `backend/ai/__init__.py`. Cloud backends extend the factory with a `storage_backend` parameter (from the document record or upload context). +- **HKDF encryption:** Established for cloud credentials in CLAUDE.md. Same pattern as cloud credentials is already used in the codebase — reuse the derivation utility. +- **Pydantic whitelist response models:** `CloudConnectionOut` pattern from Phase 4 — never expose `credentials_enc`. Apply to all new cloud endpoints. +- **`asyncio.to_thread()`:** All sync SDK calls (cloud provider SDKs may be sync) wrapped in `asyncio.to_thread()` — matches MinIOBackend pattern. +- **Audit log:** `write_audit_log()` helper from Phase 4 — call on cloud connect, disconnect, and re-auth events. +- **`get_regular_user` dep:** All cloud connection endpoints use `get_regular_user` (admin blocked from this surface — CLOUD credentials are personal, not platform-managed). + +### Integration Points +- `GET/POST /api/cloud/connections` — new endpoint group for connecting, listing, and disconnecting cloud backends. +- `GET /api/cloud/oauth/initiate/{provider}` — redirects user to OAuth consent URL. +- `GET /api/cloud/oauth/callback/{provider}` — FastAPI OAuth callback; exchanges code, saves credentials, redirects to Vue. +- `GET /api/cloud/folders/{provider}/{folder_id}` — lists children of a cloud folder (lazy-load tree). +- Upload endpoint (`POST /api/documents/upload`) — must detect active folder's backend and route accordingly. +- `GET /api/documents/{id}/content` — already proxies bytes; must resolve backend from `document.storage_backend`. +- Sidebar `FolderTreeItem.vue` — add cloud provider root nodes below local folder tree. + + + + +## Specific Ideas + +- **Sidebar layout:** Local folders shown first under a "My Documents" section header; cloud providers below under a "Cloud Storage" section (or just listed as peer top-level nodes with a cloud icon). The visual separation makes it clear which node is local vs. remote. +- **Multiple providers:** All connected providers appear simultaneously in the sidebar — one node per connection. Disconnecting a provider removes its node from the tree. +- **Nextcloud/WebDAV UX copy:** The connection modal explains: "App password — can be revoked without changing your main password (recommended). Your account password — simpler to set up, but revocation requires changing your entire account password." +- **OAuth callback redirect:** On success, Vue reads `?cloud_connected=google_drive` query param in SettingsView's `onMounted` and shows a transient success toast. On error, reads `?cloud_error=…` and shows an error banner. +- **`REQUIRES_REAUTH` prompt:** When a connection has status `REQUIRES_REAUTH`, the SettingsView Cloud Storage tab shows a yellow badge and a "Reconnect" button that re-initiates the OAuth flow. + + + + +## Deferred Ideas + +- **Document migration between backends** — user-initiated move of existing MinIO docs to a cloud provider. Out of scope for Phase 5; no migration is performed. +- **Cloud-native resumable upload URLs** (provider-specific presigned upload sessions) — skipped in favor of FastAPI intermediary (simpler). Can be added as a performance optimization in a future phase. +- **Shared cloud storage (team/organization)** — multiple users sharing one cloud backend. Out of scope; `cloud_connections` is per-user. +- **Cloud folder sync / offline cache** — syncing cloud folder trees to DB for offline browsing. Out of scope; live API + TTL cache is sufficient. +- **Email notifications on REQUIRES_REAUTH** — out of scope for Phase 5; status is visible in SettingsView. + + + +--- + +*Phase: 5-Cloud Storage Backends* +*Context gathered: 2026-05-28* diff --git a/.planning/phases/05-cloud-storage-backends/05-DISCUSSION-LOG.md b/.planning/phases/05-cloud-storage-backends/05-DISCUSSION-LOG.md new file mode 100644 index 0000000..d86359e --- /dev/null +++ b/.planning/phases/05-cloud-storage-backends/05-DISCUSSION-LOG.md @@ -0,0 +1,159 @@ +# Phase 5: Cloud Storage Backends - Discussion Log + +> **Audit trail only.** Do not use as input to planning, research, or execution agents. +> Decisions are captured in CONTEXT.md — this log preserves the alternatives considered. + +**Date:** 2026-05-28 +**Phase:** 5-cloud-storage-backends +**Areas discussed:** Backend scope, OAuth flow & token refresh, Storage selection UX, Cloud document retrieval + +--- + +## Backend Scope + +| Option | Description | Selected | +|--------|-------------|----------| +| All 4 in one phase | OneDrive, Google Drive, Nextcloud, WebDAV all in Phase 5 | ✓ | +| WebDAV + Nextcloud first | Ship simpler (credential-based) backends first; OAuth providers in Phase 6 | | +| Just one provider as MVP | One end-to-end provider to prove the pattern, others follow | | + +**User's choice:** All 4 in one phase +**Notes:** User wants the full feature set shipped together. + +--- + +## OAuth Flow & Token Refresh + +### OAuth callback architecture + +| Option | Description | Selected | +|--------|-------------|----------| +| FastAPI handles it, then redirects to Vue | Backend exchanges code for tokens, saves encrypted creds, redirects browser to Vue with success/error query param | ✓ | +| Vue intercepts the callback | Frontend catches redirect, POSTs code to FastAPI — auth code briefly in frontend | | +| You decide | Claude chooses | | + +**User's choice:** FastAPI handles it, then redirects to Vue +**Notes:** Keeps tokens entirely server-side; consistent with existing auth architecture. + +### Token refresh strategy + +| Option | Description | Selected | +|--------|-------------|----------| +| On-demand refresh | Catch 401, refresh silently, retry — transparent to user | ✓ (via Other) | +| Proactive Celery beat refresh | Background task refreshes before expiry | | +| Fail and prompt re-auth | Mark REQUIRES_REAUTH on expiry, no silent refresh | | + +**User's choice:** Automatic refresh (on-demand, transparent). Also explicitly requested disconnect per-connection + "Disconnect all" option. +**Notes:** Falls back to REQUIRES_REAUTH only on `invalid_grant` (refresh token itself revoked). + +### Nextcloud/WebDAV credential method + +| Option | Description | Selected | +|--------|-------------|----------| +| URL + username + app password | App passwords revocable individually — recommended | ✓ (via Other) | +| URL + username + real password | Simpler; revocation requires changing entire account password | | +| You decide | Claude picks | | + +**User's choice:** Show both options in the UI with explanations and trade-offs; recommend app passwords. Backend stores whichever the user picks. +**Notes:** Both use HTTP Basic Auth at the protocol level. UI copy explains the difference. + +--- + +## Storage Selection UX + +### Sidebar cloud folder tree depth + +| Option | Description | Selected | +|--------|-------------|----------| +| Lazy-load one level at a time | Expand a node → fetch its children from cloud API | ✓ | +| Show only root of each provider | Single node per provider, click opens full-screen cloud browser | | +| Pre-fetch 2 levels deep on connect | Eager fetch on connect; faster browsing, stale quickly | | + +**User's choice:** Lazy-load one level at a time +**Notes:** Cloud providers appear as top-level sidebar nodes alongside local MinIO folders, matching a Windows Explorer / Nextcloud-style file manager layout. + +### Upload destination + +| Option | Description | Selected | +|--------|-------------|----------| +| Follows the active folder | Upload goes to the backend of the folder the user is viewing | ✓ | +| Default backend in settings | Global setting overridden per-upload | | +| Per-upload choice at upload time | Dropdown on every upload dialog | | + +**User's choice:** Follows the active folder (context-driven) +**Notes:** No explicit setting needed — the active folder's backend determines the destination. + +### Existing document migration + +| Option | Description | Selected | +|--------|-------------|----------| +| Stay in MinIO — no migration | Existing docs unaffected; local and cloud coexist | ✓ | +| Optional migration | Post-connect prompt to migrate existing docs | | +| You decide | | | + +**User's choice:** Stay in MinIO — no migration +**Notes:** CLOUD-03 satisfied by coexistence without migration. + +### Cloud provider management location + +| Option | Description | Selected | +|--------|-------------|----------| +| Existing SettingsView, new "Cloud Storage" tab | Add tab to SettingsView alongside existing tabs | ✓ | +| Dedicated /cloud-storage route | New full-page view | | +| Sidebar action on cloud provider node | Gear icon → management popover | | + +**User's choice:** New "Cloud Storage" tab in SettingsView + +--- + +## Cloud Document Retrieval + +### Upload path for cloud backends + +| Option | Description | Selected | +|--------|-------------|----------| +| FastAPI intermediary | File bytes go through FastAPI → cloud provider API | ✓ | +| Cloud-native resumable upload URLs | Provider-specific upload session URL generated and sent to browser | | +| You decide | | | + +**User's choice:** FastAPI intermediary for cloud uploads +**Notes:** Presigned-PUT-URL flow stays MinIO-only. Cloud backends' `generate_presigned_put_url` raises `NotImplementedError`. + +### Download/preview path + +| Option | Description | Selected | +|--------|-------------|----------| +| Same /api/documents/{id}/content proxy | Backend resolves StorageBackend from document.storage_backend | ✓ | +| Separate /api/documents/{id}/cloud-content | Parallel endpoint for cloud docs | | +| Temporary cloud provider URL (redirect) | Return provider's signed download URL to browser — exposes cloud URLs | | + +**User's choice:** Same proxy endpoint +**Notes:** Frontend remains storage-backend-agnostic. + +### Cloud folder tree freshness + +| Option | Description | Selected | +|--------|-------------|----------| +| Live calls + 60s in-memory TTL cache | Per-folder cache keyed by user+provider+path; 60s TTL | ✓ | +| Live calls only, no cache | Always fresh; no protection against rapid UI interactions | | +| You decide | | | + +**User's choice:** Live calls + 60s in-memory TTL cache +**Notes:** User raised valid concern about cloud API rate limits and potential throttling. Claude explained: human-paced browsing is well within all provider limits (Google Drive: 12k req/100s per user); TTL cache protects against collapse/re-expand patterns. No DB sync needed. + +--- + +## Claude's Discretion + +- Python OAuth library choice (Google: `google-auth-oauthlib`; Microsoft: `msal`) +- WebDAV Python library choice (`webdavclient3` vs. `aiohttp` with manual PROPFIND) +- TTL cache implementation (`cachetools.TTLCache` vs. dict + timestamp) +- OAuth state store implementation (Redis / short-lived DB row / signed JWT) + +## Deferred Ideas + +- Document migration between backends (local → cloud) +- Cloud-native resumable upload URLs (performance optimization) +- Shared/team cloud storage +- Cloud folder tree DB sync / offline cache +- Email notifications on REQUIRES_REAUTH