docs(05): capture phase 5 context — cloud storage backends
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,148 @@
|
||||
# Phase 5: Cloud Storage Backends - Context
|
||||
|
||||
**Gathered:** 2026-05-28
|
||||
**Status:** Ready for planning
|
||||
|
||||
<domain>
|
||||
## Phase Boundary
|
||||
|
||||
Users can connect OneDrive, Google Drive, Nextcloud, or a generic WebDAV server as a personal storage backend through the DocuVault web UI. Connected cloud providers appear alongside local MinIO folders in the existing sidebar folder tree. Credentials are encrypted per-user via HKDF. Connection status is visible and manageable from a new "Cloud Storage" tab in SettingsView. Local MinIO storage and all connected cloud backends coexist — no document migration. The `StorageBackend` ABC is extended with four new concrete implementations.
|
||||
|
||||
**All 4 providers ship in this phase** — no phased delivery.
|
||||
|
||||
</domain>
|
||||
|
||||
<decisions>
|
||||
## Implementation Decisions
|
||||
|
||||
### Backend Scope
|
||||
- **D-01:** All 4 providers (OneDrive/Microsoft Graph, Google Drive v3, Nextcloud, WebDAV) are delivered in this single phase.
|
||||
- **D-02:** Each provider is a concrete `StorageBackend` subclass in `backend/storage/` (e.g., `google_drive_backend.py`, `onedrive_backend.py`, `nextcloud_backend.py`, `webdav_backend.py`). The existing ABC's 7 abstract methods define the contract.
|
||||
|
||||
### OAuth Flow (Google Drive & OneDrive)
|
||||
- **D-03:** FastAPI owns the OAuth callback. Flow: user clicks "Connect" in SettingsView → redirected to provider's OAuth consent page → provider redirects to `GET /api/cloud/oauth/callback/{provider}?code=…&state=…` → FastAPI exchanges code for tokens, encrypts credentials, saves to `cloud_connections`, then redirects browser to Vue settings page with `?cloud_connected=google_drive` (or `?cloud_error=…`). The auth code and tokens never land in the frontend.
|
||||
- **D-04:** OAuth state parameter must encode the authenticated user's ID (signed or encrypted) to prevent CSRF on the callback. Use `secrets.token_urlsafe(32)` + a short-lived server-side state store (Redis or DB) to validate the callback matches the initiating user session.
|
||||
- **D-05:** Access token refresh is **on-demand and transparent**. When a cloud API call fails with a token-expiry error (HTTP 401 / provider-specific error), the backend catches it, uses the stored refresh token to obtain a new access token, updates `credentials_enc` in the DB, and retries the original call — all within the same request. The user experiences no interruption.
|
||||
- **D-06:** If the refresh token itself is rejected by the provider (`invalid_grant` or equivalent), the connection status transitions to `REQUIRES_REAUTH` and the request returns an error telling the user to reconnect. No silent failure.
|
||||
|
||||
### Nextcloud & WebDAV Credentials
|
||||
- **D-07:** The UI presents both auth methods — real account password and app-specific password — with an explanation of trade-offs and a clear recommendation for app password. The backend stores whichever the user provides (both use HTTP Basic Auth). The recommendation text: app passwords can be revoked individually without changing the main account password.
|
||||
- **D-08:** On save, the backend validates the WebDAV/Nextcloud connection (a lightweight PROPFIND or OPTIONS request) before storing credentials. If validation fails, return an error — never store unverified credentials.
|
||||
|
||||
### Storage Selection & Coexistence
|
||||
- **D-09:** The sidebar folder tree shows local MinIO folders first, then each connected cloud provider as a peer top-level node (e.g., "Google Drive", "My Nextcloud"). Lazy-load one level at a time: when the user expands a cloud node, the backend fetches the first level of that provider's folder tree via the cloud API.
|
||||
- **D-10:** Upload destination follows the **active folder context**. If the user is viewing a local folder, uploads go to MinIO. If they are viewing a cloud provider folder, uploads go to that cloud provider via FastAPI intermediary (no direct browser-to-cloud upload). The `documents.storage_backend` column already exists to record which backend holds each document.
|
||||
- **D-11:** Existing MinIO documents stay in MinIO — no migration. Local and cloud documents coexist. `document.storage_backend = "minio"` for existing docs; new cloud docs get `storage_backend = "google_drive"` etc.
|
||||
- **D-12:** Cloud provider management lives in a new **"Cloud Storage" tab in SettingsView**. The tab shows: all supported providers; connection status badge (`ACTIVE` / `REQUIRES_REAUTH` / `ERROR` / not connected); "Connect" button for unconnected providers; per-connection "Disconnect" button; a "Disconnect all" action.
|
||||
- **D-13:** Multiple cloud providers can be connected simultaneously (one row per provider in `cloud_connections`). Each provider's tree appears as its own top-level node in the sidebar.
|
||||
|
||||
### Cloud Document Upload
|
||||
- **D-14:** For cloud backends, file bytes go through FastAPI first (`POST /api/documents/upload` detects the target backend from the active folder context), then FastAPI calls the cloud provider API to store them. The presigned-PUT-URL flow (used for MinIO) is **not used** for cloud backends. The `generate_presigned_put_url` method on cloud `StorageBackend` implementations can raise `NotImplementedError` — the upload endpoint detects cloud backends and uses the direct upload path.
|
||||
|
||||
### Cloud Document Retrieval
|
||||
- **D-15:** Document downloads/previews use the **same `GET /api/documents/{id}/content` proxy endpoint** regardless of storage backend. The endpoint calls `storage_backend.get_object(document.object_key)` and streams the bytes to the browser. The frontend does not know or care which backend holds the file.
|
||||
- **D-16:** Cloud folder tree browsing is **live API calls** (no DB sync). A **60-second in-memory TTL cache** (keyed by `user_id + provider + folder_path`) prevents redundant calls when the user collapses and re-expands the same node within one minute. The cache lives in FastAPI application state (or `functools.lru_cache`-equivalent with TTL). Not Redis — in-memory is sufficient for a single-user session pattern.
|
||||
|
||||
### SSRF Prevention
|
||||
- **D-17:** All outbound HTTP calls to WebDAV/Nextcloud use a URL allowlist: the server URL provided by the user must pass hostname validation (not `localhost`, `127.x`, `169.254.x`, private RFC 1918 ranges, or `::1`). Validation runs at connect-time and before every request. Implemented in a shared `validate_cloud_url()` utility — all WebDAV/Nextcloud backends call it before constructing requests.
|
||||
|
||||
### Security Invariants (carry-forward)
|
||||
- **D-18:** `credentials_enc` is encrypted with HKDF per-user key derivation (`HKDF(CLOUD_CREDS_KEY, salt=user_id_bytes, info=b"cloud-credentials")`). The master key lives in the `CLOUD_CREDS_KEY` env var. Never stored unencrypted. Never returned in any API response.
|
||||
- **D-19:** Admin API responses for cloud connections return only `provider, display_name, connected_at, status` (the existing `CloudConnectionOut` Pydantic whitelist pattern from Phase 4).
|
||||
|
||||
### Claude's Discretion
|
||||
- Choice of Python OAuth client library for Google Drive and OneDrive (e.g., `google-auth-oauthlib`, `msal`) — Claude selects based on PyPI availability and Phase 5 open question in STATE.md ("Verify cloud SDK minor versions on PyPI before Phase 5 pinning").
|
||||
- Choice of WebDAV Python library (e.g., `webdavclient3`, `aiohttp` with manual PROPFIND) — Claude selects based on async compatibility.
|
||||
- Exact TTL cache implementation (dict + timestamp vs. `cachetools.TTLCache`) — Claude picks the simplest approach with no new dependency if possible.
|
||||
- OAuth state store implementation (Redis vs. short-lived DB row vs. signed JWT) — Claude selects based on what's already wired in the stack.
|
||||
|
||||
</decisions>
|
||||
|
||||
<canonical_refs>
|
||||
## Canonical References
|
||||
|
||||
**Downstream agents MUST read these before planning or implementing.**
|
||||
|
||||
### Storage Backend Contract
|
||||
- `backend/storage/base.py` — `StorageBackend` ABC: 7 abstract methods that all new cloud backends must implement. Note: `generate_presigned_put_url` raises `NotImplementedError` for cloud backends (D-14).
|
||||
- `backend/storage/__init__.py` — `get_storage_backend()` factory: Phase 5 must extend this to resolve the correct backend from the document's `storage_backend` field and the user's active context.
|
||||
- `backend/storage/minio_backend.py` — Reference implementation of `StorageBackend` — patterns for `asyncio.to_thread()` wrapping and error handling.
|
||||
|
||||
### Data Model
|
||||
- `backend/db/models.py` — `CloudConnection` model (fields: `id`, `user_id`, `provider`, `display_name`, `credentials_enc`, `status`, `connected_at`). The `cloud_connections` table already exists from the Phase 1 migration. Also see `Document` model — `storage_backend` column records which backend holds each document.
|
||||
|
||||
### Requirements
|
||||
- `.planning/REQUIREMENTS.md` — CLOUD-01 through CLOUD-07 (the 7 cloud storage requirements for this phase).
|
||||
- `.planning/ROADMAP.md` — Phase 5 goal, success criteria, and phase gates (SSRF test, credential encryption round-trip, admin response never exposing `credentials_enc`, OAuth `invalid_grant` handling).
|
||||
|
||||
### Security Protocol
|
||||
- `CLAUDE.md` §"Key Architectural Rules" — HKDF per-user key derivation pattern, SSRF allowlist requirement, `credentials_enc` never in API responses.
|
||||
- `CLAUDE.md` §"Security Protocol" — SSRF section: "user-supplied URLs for WebDAV/Nextcloud must pass hostname allowlist".
|
||||
|
||||
### AI Provider Pattern (structural analog)
|
||||
- `backend/ai/base.py` — `AIProvider` ABC: Phase 5 cloud backends mirror this pattern (ABC + factory + per-provider file).
|
||||
- `backend/ai/__init__.py` — `get_provider()` factory pattern to mirror in `get_storage_backend()` extension.
|
||||
|
||||
### Frontend Patterns
|
||||
- `frontend/src/stores/` — Pinia store patterns established in Phases 2–4 (auth store, folders store). Cloud connections store follows same pattern.
|
||||
- `frontend/src/views/SettingsView.vue` — Existing view to extend with "Cloud Storage" tab.
|
||||
- `frontend/src/components/FolderTreeItem.vue` (Phase 4) — Lazy-loading tree component to extend for cloud provider nodes.
|
||||
|
||||
</canonical_refs>
|
||||
|
||||
<code_context>
|
||||
## Existing Code Insights
|
||||
|
||||
### Reusable Assets
|
||||
- `backend/storage/base.py` (`StorageBackend` ABC) — New cloud backends subclass this directly. All 4 abstract methods beyond `generate_presigned_put_url` must be implemented.
|
||||
- `backend/storage/minio_backend.py` — Template for `asyncio.to_thread()` pattern, error handling shape, and constructor signature.
|
||||
- `backend/db/models.py` (`CloudConnection`) — Table already exists; no new migration needed for the connection model itself. A new Alembic migration may be needed to add `storage_backend` column to `documents` if not already present (verify).
|
||||
- `frontend/src/components/FolderTreeItem.vue` — Existing lazy-load tree item; extend to support cloud provider root nodes with a different icon and live-fetch behavior.
|
||||
- `frontend/src/views/SettingsView.vue` — Tab-based layout; add "Cloud Storage" as a new tab following the same pattern as existing tabs.
|
||||
- `GET /api/documents/{id}/content` (Phase 4, Plan 04-05) — PDF proxy endpoint. Phase 5 makes this backend-agnostic by routing through `get_storage_backend()` per document.
|
||||
|
||||
### Established Patterns
|
||||
- **Factory pattern:** `get_storage_backend()` in `backend/storage/__init__.py` mirrors `get_provider()` in `backend/ai/__init__.py`. Cloud backends extend the factory with a `storage_backend` parameter (from the document record or upload context).
|
||||
- **HKDF encryption:** Established for cloud credentials in CLAUDE.md. Same pattern as cloud credentials is already used in the codebase — reuse the derivation utility.
|
||||
- **Pydantic whitelist response models:** `CloudConnectionOut` pattern from Phase 4 — never expose `credentials_enc`. Apply to all new cloud endpoints.
|
||||
- **`asyncio.to_thread()`:** All sync SDK calls (cloud provider SDKs may be sync) wrapped in `asyncio.to_thread()` — matches MinIOBackend pattern.
|
||||
- **Audit log:** `write_audit_log()` helper from Phase 4 — call on cloud connect, disconnect, and re-auth events.
|
||||
- **`get_regular_user` dep:** All cloud connection endpoints use `get_regular_user` (admin blocked from this surface — CLOUD credentials are personal, not platform-managed).
|
||||
|
||||
### Integration Points
|
||||
- `GET/POST /api/cloud/connections` — new endpoint group for connecting, listing, and disconnecting cloud backends.
|
||||
- `GET /api/cloud/oauth/initiate/{provider}` — redirects user to OAuth consent URL.
|
||||
- `GET /api/cloud/oauth/callback/{provider}` — FastAPI OAuth callback; exchanges code, saves credentials, redirects to Vue.
|
||||
- `GET /api/cloud/folders/{provider}/{folder_id}` — lists children of a cloud folder (lazy-load tree).
|
||||
- Upload endpoint (`POST /api/documents/upload`) — must detect active folder's backend and route accordingly.
|
||||
- `GET /api/documents/{id}/content` — already proxies bytes; must resolve backend from `document.storage_backend`.
|
||||
- Sidebar `FolderTreeItem.vue` — add cloud provider root nodes below local folder tree.
|
||||
|
||||
</code_context>
|
||||
|
||||
<specifics>
|
||||
## Specific Ideas
|
||||
|
||||
- **Sidebar layout:** Local folders shown first under a "My Documents" section header; cloud providers below under a "Cloud Storage" section (or just listed as peer top-level nodes with a cloud icon). The visual separation makes it clear which node is local vs. remote.
|
||||
- **Multiple providers:** All connected providers appear simultaneously in the sidebar — one node per connection. Disconnecting a provider removes its node from the tree.
|
||||
- **Nextcloud/WebDAV UX copy:** The connection modal explains: "App password — can be revoked without changing your main password (recommended). Your account password — simpler to set up, but revocation requires changing your entire account password."
|
||||
- **OAuth callback redirect:** On success, Vue reads `?cloud_connected=google_drive` query param in SettingsView's `onMounted` and shows a transient success toast. On error, reads `?cloud_error=…` and shows an error banner.
|
||||
- **`REQUIRES_REAUTH` prompt:** When a connection has status `REQUIRES_REAUTH`, the SettingsView Cloud Storage tab shows a yellow badge and a "Reconnect" button that re-initiates the OAuth flow.
|
||||
|
||||
</specifics>
|
||||
|
||||
<deferred>
|
||||
## Deferred Ideas
|
||||
|
||||
- **Document migration between backends** — user-initiated move of existing MinIO docs to a cloud provider. Out of scope for Phase 5; no migration is performed.
|
||||
- **Cloud-native resumable upload URLs** (provider-specific presigned upload sessions) — skipped in favor of FastAPI intermediary (simpler). Can be added as a performance optimization in a future phase.
|
||||
- **Shared cloud storage (team/organization)** — multiple users sharing one cloud backend. Out of scope; `cloud_connections` is per-user.
|
||||
- **Cloud folder sync / offline cache** — syncing cloud folder trees to DB for offline browsing. Out of scope; live API + TTL cache is sufficient.
|
||||
- **Email notifications on REQUIRES_REAUTH** — out of scope for Phase 5; status is visible in SettingsView.
|
||||
|
||||
</deferred>
|
||||
|
||||
---
|
||||
|
||||
*Phase: 5-Cloud Storage Backends*
|
||||
*Context gathered: 2026-05-28*
|
||||
@@ -0,0 +1,159 @@
|
||||
# Phase 5: Cloud Storage Backends - Discussion Log
|
||||
|
||||
> **Audit trail only.** Do not use as input to planning, research, or execution agents.
|
||||
> Decisions are captured in CONTEXT.md — this log preserves the alternatives considered.
|
||||
|
||||
**Date:** 2026-05-28
|
||||
**Phase:** 5-cloud-storage-backends
|
||||
**Areas discussed:** Backend scope, OAuth flow & token refresh, Storage selection UX, Cloud document retrieval
|
||||
|
||||
---
|
||||
|
||||
## Backend Scope
|
||||
|
||||
| Option | Description | Selected |
|
||||
|--------|-------------|----------|
|
||||
| All 4 in one phase | OneDrive, Google Drive, Nextcloud, WebDAV all in Phase 5 | ✓ |
|
||||
| WebDAV + Nextcloud first | Ship simpler (credential-based) backends first; OAuth providers in Phase 6 | |
|
||||
| Just one provider as MVP | One end-to-end provider to prove the pattern, others follow | |
|
||||
|
||||
**User's choice:** All 4 in one phase
|
||||
**Notes:** User wants the full feature set shipped together.
|
||||
|
||||
---
|
||||
|
||||
## OAuth Flow & Token Refresh
|
||||
|
||||
### OAuth callback architecture
|
||||
|
||||
| Option | Description | Selected |
|
||||
|--------|-------------|----------|
|
||||
| FastAPI handles it, then redirects to Vue | Backend exchanges code for tokens, saves encrypted creds, redirects browser to Vue with success/error query param | ✓ |
|
||||
| Vue intercepts the callback | Frontend catches redirect, POSTs code to FastAPI — auth code briefly in frontend | |
|
||||
| You decide | Claude chooses | |
|
||||
|
||||
**User's choice:** FastAPI handles it, then redirects to Vue
|
||||
**Notes:** Keeps tokens entirely server-side; consistent with existing auth architecture.
|
||||
|
||||
### Token refresh strategy
|
||||
|
||||
| Option | Description | Selected |
|
||||
|--------|-------------|----------|
|
||||
| On-demand refresh | Catch 401, refresh silently, retry — transparent to user | ✓ (via Other) |
|
||||
| Proactive Celery beat refresh | Background task refreshes before expiry | |
|
||||
| Fail and prompt re-auth | Mark REQUIRES_REAUTH on expiry, no silent refresh | |
|
||||
|
||||
**User's choice:** Automatic refresh (on-demand, transparent). Also explicitly requested disconnect per-connection + "Disconnect all" option.
|
||||
**Notes:** Falls back to REQUIRES_REAUTH only on `invalid_grant` (refresh token itself revoked).
|
||||
|
||||
### Nextcloud/WebDAV credential method
|
||||
|
||||
| Option | Description | Selected |
|
||||
|--------|-------------|----------|
|
||||
| URL + username + app password | App passwords revocable individually — recommended | ✓ (via Other) |
|
||||
| URL + username + real password | Simpler; revocation requires changing entire account password | |
|
||||
| You decide | Claude picks | |
|
||||
|
||||
**User's choice:** Show both options in the UI with explanations and trade-offs; recommend app passwords. Backend stores whichever the user picks.
|
||||
**Notes:** Both use HTTP Basic Auth at the protocol level. UI copy explains the difference.
|
||||
|
||||
---
|
||||
|
||||
## Storage Selection UX
|
||||
|
||||
### Sidebar cloud folder tree depth
|
||||
|
||||
| Option | Description | Selected |
|
||||
|--------|-------------|----------|
|
||||
| Lazy-load one level at a time | Expand a node → fetch its children from cloud API | ✓ |
|
||||
| Show only root of each provider | Single node per provider, click opens full-screen cloud browser | |
|
||||
| Pre-fetch 2 levels deep on connect | Eager fetch on connect; faster browsing, stale quickly | |
|
||||
|
||||
**User's choice:** Lazy-load one level at a time
|
||||
**Notes:** Cloud providers appear as top-level sidebar nodes alongside local MinIO folders, matching a Windows Explorer / Nextcloud-style file manager layout.
|
||||
|
||||
### Upload destination
|
||||
|
||||
| Option | Description | Selected |
|
||||
|--------|-------------|----------|
|
||||
| Follows the active folder | Upload goes to the backend of the folder the user is viewing | ✓ |
|
||||
| Default backend in settings | Global setting overridden per-upload | |
|
||||
| Per-upload choice at upload time | Dropdown on every upload dialog | |
|
||||
|
||||
**User's choice:** Follows the active folder (context-driven)
|
||||
**Notes:** No explicit setting needed — the active folder's backend determines the destination.
|
||||
|
||||
### Existing document migration
|
||||
|
||||
| Option | Description | Selected |
|
||||
|--------|-------------|----------|
|
||||
| Stay in MinIO — no migration | Existing docs unaffected; local and cloud coexist | ✓ |
|
||||
| Optional migration | Post-connect prompt to migrate existing docs | |
|
||||
| You decide | | |
|
||||
|
||||
**User's choice:** Stay in MinIO — no migration
|
||||
**Notes:** CLOUD-03 satisfied by coexistence without migration.
|
||||
|
||||
### Cloud provider management location
|
||||
|
||||
| Option | Description | Selected |
|
||||
|--------|-------------|----------|
|
||||
| Existing SettingsView, new "Cloud Storage" tab | Add tab to SettingsView alongside existing tabs | ✓ |
|
||||
| Dedicated /cloud-storage route | New full-page view | |
|
||||
| Sidebar action on cloud provider node | Gear icon → management popover | |
|
||||
|
||||
**User's choice:** New "Cloud Storage" tab in SettingsView
|
||||
|
||||
---
|
||||
|
||||
## Cloud Document Retrieval
|
||||
|
||||
### Upload path for cloud backends
|
||||
|
||||
| Option | Description | Selected |
|
||||
|--------|-------------|----------|
|
||||
| FastAPI intermediary | File bytes go through FastAPI → cloud provider API | ✓ |
|
||||
| Cloud-native resumable upload URLs | Provider-specific upload session URL generated and sent to browser | |
|
||||
| You decide | | |
|
||||
|
||||
**User's choice:** FastAPI intermediary for cloud uploads
|
||||
**Notes:** Presigned-PUT-URL flow stays MinIO-only. Cloud backends' `generate_presigned_put_url` raises `NotImplementedError`.
|
||||
|
||||
### Download/preview path
|
||||
|
||||
| Option | Description | Selected |
|
||||
|--------|-------------|----------|
|
||||
| Same /api/documents/{id}/content proxy | Backend resolves StorageBackend from document.storage_backend | ✓ |
|
||||
| Separate /api/documents/{id}/cloud-content | Parallel endpoint for cloud docs | |
|
||||
| Temporary cloud provider URL (redirect) | Return provider's signed download URL to browser — exposes cloud URLs | |
|
||||
|
||||
**User's choice:** Same proxy endpoint
|
||||
**Notes:** Frontend remains storage-backend-agnostic.
|
||||
|
||||
### Cloud folder tree freshness
|
||||
|
||||
| Option | Description | Selected |
|
||||
|--------|-------------|----------|
|
||||
| Live calls + 60s in-memory TTL cache | Per-folder cache keyed by user+provider+path; 60s TTL | ✓ |
|
||||
| Live calls only, no cache | Always fresh; no protection against rapid UI interactions | |
|
||||
| You decide | | |
|
||||
|
||||
**User's choice:** Live calls + 60s in-memory TTL cache
|
||||
**Notes:** User raised valid concern about cloud API rate limits and potential throttling. Claude explained: human-paced browsing is well within all provider limits (Google Drive: 12k req/100s per user); TTL cache protects against collapse/re-expand patterns. No DB sync needed.
|
||||
|
||||
---
|
||||
|
||||
## Claude's Discretion
|
||||
|
||||
- Python OAuth library choice (Google: `google-auth-oauthlib`; Microsoft: `msal`)
|
||||
- WebDAV Python library choice (`webdavclient3` vs. `aiohttp` with manual PROPFIND)
|
||||
- TTL cache implementation (`cachetools.TTLCache` vs. dict + timestamp)
|
||||
- OAuth state store implementation (Redis / short-lived DB row / signed JWT)
|
||||
|
||||
## Deferred Ideas
|
||||
|
||||
- Document migration between backends (local → cloud)
|
||||
- Cloud-native resumable upload URLs (performance optimization)
|
||||
- Shared/team cloud storage
|
||||
- Cloud folder tree DB sync / offline cache
|
||||
- Email notifications on REQUIRES_REAUTH
|
||||
Reference in New Issue
Block a user