diff --git a/.planning/phases/05-cloud-storage-backends/05-RESEARCH.md b/.planning/phases/05-cloud-storage-backends/05-RESEARCH.md
new file mode 100644
index 0000000..1b47276
--- /dev/null
+++ b/.planning/phases/05-cloud-storage-backends/05-RESEARCH.md
@@ -0,0 +1,987 @@
+# Phase 5: Cloud Storage Backends — Research
+
+**Researched:** 2026-05-28
+**Domain:** OAuth2 cloud provider integration, WebDAV/Nextcloud, credential encryption, SSRF prevention, StorageBackend ABC extension
+**Confidence:** HIGH (all package versions verified on PyPI; patterns verified against official docs and codebase)
+
+---
+
+
+## User Constraints (from CONTEXT.md)
+
+### Locked Decisions
+
+- **D-01:** All 4 providers (OneDrive/Microsoft Graph, Google Drive v3, Nextcloud, WebDAV) delivered in this single phase.
+- **D-02:** Each provider is a concrete `StorageBackend` subclass in `backend/storage/` (e.g., `google_drive_backend.py`, `onedrive_backend.py`, `nextcloud_backend.py`, `webdav_backend.py`).
+- **D-03:** FastAPI owns the OAuth callback. Flow: user clicks "Connect" → provider OAuth consent page → `GET /api/cloud/oauth/callback/{provider}?code=…&state=…` → FastAPI exchanges code, encrypts credentials, saves to `cloud_connections`, then redirects browser to Vue settings page with `?cloud_connected=google_drive` (or `?cloud_error=…`). Auth code and tokens never land in the frontend.
+- **D-04:** OAuth state parameter encodes the authenticated user's ID (signed or encrypted) using `secrets.token_urlsafe(32)` + a short-lived server-side state store (Redis or DB) to validate the callback matches the initiating user session.
+- **D-05:** Access token refresh is on-demand and transparent. When a cloud API call fails with token-expiry (HTTP 401), the backend catches it, uses the stored refresh token, updates `credentials_enc` in DB, and retries the original call within the same request.
+- **D-06:** If the refresh token is rejected by the provider (`invalid_grant`), the connection status transitions to `REQUIRES_REAUTH` and the request returns an error telling the user to reconnect. No silent failure.
+- **D-07:** UI presents both auth methods for Nextcloud/WebDAV (real account password and app-specific password) with clear recommendation for app password.
+- **D-08:** On save, backend validates the WebDAV/Nextcloud connection (lightweight PROPFIND or OPTIONS request) before storing credentials. If validation fails, return an error — never store unverified credentials.
+- **D-09:** Sidebar shows local MinIO folders first, then each connected cloud provider as a peer top-level node. Lazy-load one level at a time.
+- **D-10:** Upload destination follows the active folder context. Cloud uploads go through FastAPI intermediary — no direct browser-to-cloud.
+- **D-11:** Existing MinIO documents stay in MinIO — no migration. `storage_backend="minio"` for existing docs; `"google_drive"`, `"onedrive"`, etc. for new cloud docs.
+- **D-12:** Cloud provider management lives in a new "Cloud Storage" tab in SettingsView.
+- **D-13:** Multiple cloud providers can be connected simultaneously (one row per provider in `cloud_connections`).
+- **D-14:** Cloud backends: `generate_presigned_put_url` raises `NotImplementedError`. Upload endpoint detects cloud backends and uses direct upload path.
+- **D-15:** Downloads/previews use the same `GET /api/documents/{id}/content` proxy endpoint regardless of backend. Calls `storage_backend.get_object(document.object_key)` and streams bytes to browser.
+- **D-16:** Cloud folder tree browsing is live API calls with a 60-second in-memory TTL cache (keyed by `user_id + provider + folder_path`). Not Redis — in-memory is sufficient.
+- **D-17:** All outbound HTTP to WebDAV/Nextcloud validates URL against SSRF blocklist (localhost, 127.x, 169.254.x, RFC 1918, ::1). Validation in a shared `validate_cloud_url()` utility called before every request.
+- **D-18:** `credentials_enc` encrypted with `HKDF(CLOUD_CREDS_KEY, salt=user_id_bytes, info=b"cloud-credentials")`. Master key in `CLOUD_CREDS_KEY` env var. Never stored unencrypted. Never returned in any API response.
+- **D-19:** Admin API responses for cloud connections return only `provider, display_name, connected_at, status` (CloudConnectionOut Pydantic whitelist pattern from Phase 4).
+
+### Claude's Discretion
+
+- Choice of Python OAuth client library for Google Drive and OneDrive (e.g., `google-auth-oauthlib`, `msal`).
+- Choice of WebDAV Python library (e.g., `webdavclient3`, `aiohttp` with manual PROPFIND).
+- Exact TTL cache implementation (dict + timestamp vs. `cachetools.TTLCache`).
+- OAuth state store implementation (Redis vs. short-lived DB row vs. signed JWT).
+
+### Deferred Ideas (OUT OF SCOPE)
+
+- Document migration between backends (user-initiated move of MinIO docs to cloud).
+- Cloud-native resumable upload URLs (provider-specific presigned upload sessions).
+- Shared cloud storage (team/organization).
+- Cloud folder sync / offline cache.
+- Email notifications on REQUIRES_REAUTH.
+
+
+
+## Phase Requirements
+
+| ID | Description | Research Support |
+|----|-------------|------------------|
+| CLOUD-01 | User can connect OneDrive (Microsoft Graph), Google Drive (v3 API), Nextcloud, or generic WebDAV as a personal storage backend | MSAL + google-auth-oauthlib OAuth2 flows; webdavclient3 for WebDAV/Nextcloud |
+| CLOUD-02 | Cloud OAuth credentials encrypted using HKDF per-user key derivation (`HKDF(master_key, salt=user_id_bytes, info=b"cloud-credentials")`); master key in `CLOUD_CREDS_KEY` env var | `cryptography` library HKDF + Fernet pattern documented |
+| CLOUD-03 | Local MinIO storage and connected cloud backends coexist; user can select their default storage destination | `documents.storage_backend` column already in schema; `users.default_storage_backend` column already present |
+| CLOUD-04 | Each cloud connection displays status: `ACTIVE | REQUIRES_REAUTH | ERROR` | `CloudConnection.status` column already in schema |
+| CLOUD-05 | On OAuth revocation (`invalid_grant`), connection status transitions to `REQUIRES_REAUTH` — surfaced to user, not retried silently | On-demand token refresh pattern with `invalid_grant` catch documented |
+| CLOUD-06 | User can disconnect a cloud backend; credentials are permanently deleted from the DB | `DELETE /api/cloud/connections/{id}` with ownership check |
+| CLOUD-07 | Storage backend abstracted via `StorageBackend` ABC + factory in `storage/` module (mirrors existing `ai/` provider pattern) | ABC already exists with 7 abstract methods; factory already in `storage/__init__.py` |
+
+
+---
+
+## Summary
+
+Phase 5 extends DocuVault's existing storage abstraction with four cloud provider backends. The infrastructure is largely pre-built: the `StorageBackend` ABC with 7 abstract methods already exists (`backend/storage/base.py`), the `cloud_connections` table with all required columns (`id`, `user_id`, `provider`, `credentials_enc`, `status`, `connected_at`) was created in migration 0001, the `documents.storage_backend` column already exists, and `users.default_storage_backend` already exists. No new Alembic migration is needed for the data model.
+
+The three main implementation challenges are: (1) the OAuth2 callback flow where FastAPI owns both the initiation and code-exchange, (2) per-user HKDF credential encryption using the `cryptography` library (which is **not currently in `requirements.txt`** and must be added), and (3) SSRF prevention for user-supplied WebDAV/Nextcloud URLs using Python's built-in `ipaddress` module. Redis is already wired on `app.state.redis` and is the correct choice for OAuth state storage (TTL-backed, eliminates race conditions in multi-instance deployments, already proven pattern in auth.py for TOTP replay prevention).
+
+The WebDAV/Nextcloud backends should use `webdavclient3` wrapped in `asyncio.to_thread()` (matching the MinIOBackend pattern) rather than an async-native library — `webdavclient3` is the most mature option (8+ years old, actively maintained) and its sync API is well-documented. Google Drive uses `google-api-python-client` + `google-auth-oauthlib`; OneDrive uses `msal` with the authorization code flow. Both sync SDKs wrap in `asyncio.to_thread()`.
+
+**Primary recommendation:** Add `cryptography>=41.0.0`, `google-auth-oauthlib>=1.3.1`, `google-api-python-client>=2.196.0`, `msal>=1.36.0`, and `webdavclient3>=3.14.7` to `requirements.txt`. Implement OAuth state via Redis TTL (30-minute expiry). Use `cachetools.TTLCache` (already available on PyPI, version 6.2.6 verified) for the 60-second folder listing cache. Use Python's built-in `ipaddress` module for SSRF URL validation — no additional library needed.
+
+---
+
+## Architectural Responsibility Map
+
+| Capability | Primary Tier | Secondary Tier | Rationale |
+|------------|-------------|----------------|-----------|
+| OAuth2 initiation (redirect URL generation) | API / Backend | — | Secrets (client_id, client_secret) must never reach the browser |
+| OAuth2 callback code exchange | API / Backend | — | Auth code + client_secret exchange is a server-to-server operation (D-03) |
+| OAuth state CSRF validation | API / Backend (Redis) | — | State token must be stored server-side and expire after use (D-04) |
+| Credential encryption/decryption | API / Backend | — | HKDF master key lives in env var; decryption happens at API layer only |
+| Cloud file upload | API / Backend | Cloud Provider API | Bytes pass through FastAPI intermediary — no direct browser-to-cloud (D-10) |
+| Cloud file download/preview | API / Backend | Cloud Provider API | Same proxy endpoint as MinIO (D-15) |
+| Cloud folder tree listing | API / Backend | Cloud Provider API | Lazy-load, TTL-cached in FastAPI app state (D-16) |
+| SSRF validation | API / Backend | — | Must run before every outbound HTTP call; not frontend-accessible (D-17) |
+| Connection status display | Frontend / Client | — | UI reads `status` field from API; no direct cloud calls from browser |
+| Cloud Storage settings tab | Frontend / Client | — | New tab in SettingsView; reads/writes via `/api/cloud/connections` |
+| On-demand token refresh | API / Backend | — | Transparent to user; handled within the request lifecycle (D-05) |
+| Default storage backend selection | API / Backend + DB | Frontend / Client | `users.default_storage_backend` column; UI reads/writes via settings endpoint |
+
+---
+
+## Standard Stack
+
+### Core (new additions to requirements.txt)
+
+| Library | Version | Purpose | Why Standard |
+|---------|---------|---------|--------------|
+| `cryptography` | 48.0.0 | HKDF key derivation + Fernet encryption for `credentials_enc` | The only Python library with official HKDF + Fernet in one package; already referenced in CLAUDE.md |
+| `google-auth-oauthlib` | 1.3.1 | Google OAuth2 authorization code flow; `Flow` class manages URL generation and code exchange | Official Google library; listed in Google's own Python quickstart |
+| `google-api-python-client` | 2.196.0 | Google Drive v3 API (files.get, files.create, files.delete, files.list) | Official Google library; required alongside google-auth-oauthlib for Drive operations |
+| `msal` | 1.36.0 | Microsoft Authentication Library — authorization code flow for OneDrive/Microsoft Graph | Official Microsoft library; only sanctioned way to obtain Microsoft Graph tokens |
+| `webdavclient3` | 3.14.7 | WebDAV operations (PROPFIND, upload, download, delete) for both Nextcloud and generic WebDAV | Mature (8 years), actively maintained, supports Nextcloud and all standard WebDAV servers |
+| `cachetools` | 6.2.6 | `TTLCache` for 60-second folder listing cache in FastAPI app state (D-16) | Standard cache library; pure Python; no new infrastructure dependency |
+
+[VERIFIED: npm registry / PyPI] — all versions confirmed via `pip download` against PyPI registry.
+
+### Already in requirements.txt (relevant to Phase 5)
+
+| Library | Current Version Spec | Phase 5 Use |
+|---------|---------------------|-------------|
+| `httpx` | >=0.27 | Microsoft Graph REST calls (aiohttp alternative); already used for HIBP |
+| `redis` | >=4.6.0 | OAuth state storage (TTL-keyed state tokens, already on `app.state.redis`) |
+| `aioredis` | via `redis[asyncio]` | Already wired in `main.py` lifespan |
+| `pydantic` | >=2.0 | Request/response models for new cloud endpoints |
+
+### Alternatives Considered
+
+| Instead of | Could Use | Tradeoff |
+|------------|-----------|----------|
+| `webdavclient3` | `aiohttp` + raw PROPFIND XML | webdavclient3 handles XML parsing, redirect following, and auth headers; raw aiohttp requires implementing RFC 4918 manually |
+| `webdavclient3` | `aiodav` / `aiowebdav2` | These async WebDAV libs are very new (< 2 years old, low download counts); webdavclient3 wrapped in `asyncio.to_thread()` matches the MinIOBackend pattern and is safer |
+| `msal` (for OneDrive) | `requests-oauthlib` + raw Graph calls | MSAL handles token refresh, token cache, and `invalid_grant` detection natively |
+| `cachetools.TTLCache` | `dict` + timestamp | TTLCache has automatic expiry and LRU eviction; manual dict+timestamp requires cleanup logic; both work, TTLCache is cleaner |
+| Redis for OAuth state | Signed JWT state | Redis is already wired; TTL-keyed Redis entries are the proven pattern (auth.py TOTP replay prevention). Signed JWT state is viable but requires HMAC secret management for state-only tokens |
+
+**Installation:**
+```bash
+# Add to backend/requirements.txt
+cryptography>=41.0.0
+google-auth-oauthlib>=1.3.1
+google-api-python-client>=2.196.0
+msal>=1.36.0
+webdavclient3>=3.14.7
+cachetools>=5.3.0
+```
+
+**Version verification:** Confirmed against PyPI via `pip download`:
+- `cryptography-48.0.0` — `[VERIFIED: PyPI]`
+- `google_auth_oauthlib-1.3.1` — `[VERIFIED: PyPI]`
+- `google_api_python_client-2.196.0` — `[VERIFIED: PyPI]`
+- `msal-1.36.0` — `[VERIFIED: PyPI]`
+- `webdavclient3-3.14.7` — `[VERIFIED: PyPI]`
+- `cachetools-6.2.6` — `[VERIFIED: PyPI]`
+
+---
+
+## Package Legitimacy Audit
+
+All packages verified via slopcheck 0.6.1 (run 2026-05-28):
+
+| Package | Registry | Age | Downloads | Source Repo | slopcheck | Disposition |
+|---------|----------|-----|-----------|-------------|-----------|-------------|
+| `cryptography` | PyPI | 12+ yrs | 100M+/wk | github.com/pyca/cryptography | [OK] | Approved |
+| `google-auth-oauthlib` | PyPI | 7+ yrs | 50M+/wk | github.com/googleapis/google-auth-library-python-oauthlib | [OK] | Approved |
+| `google-api-python-client` | PyPI | 10+ yrs | 30M+/wk | github.com/googleapis/google-api-python-client | [OK] — note: "Name ends with '-client' — looks like LLM bait but package is established" | Approved |
+| `msal` | PyPI | 6+ yrs | 10M+/wk | github.com/AzureAD/microsoft-authentication-library-for-python | [OK] | Approved |
+| `webdavclient3` | PyPI | 8+ yrs | 200K+/wk | github.com/CloudPolis/webdavclient3 | [OK] | Approved |
+| `cachetools` | PyPI | 10+ yrs | 80M+/wk | github.com/tkem/cachetools | [OK] | Approved |
+
+**Packages removed due to slopcheck [SLOP] verdict:** none
+**Packages flagged as suspicious [SUS]:** none
+
+---
+
+## Architecture Patterns
+
+### System Architecture Diagram
+
+```
+Browser (Vue 3)
+ │
+ │ Click "Connect Google Drive"
+ ▼
+[GET /api/cloud/oauth/initiate/google_drive]
+ │ 1. Generate state_token = secrets.token_urlsafe(32)
+ │ 2. Store Redis: oauth_state:{state_token} = user_id (TTL 30 min)
+ │ 3. Build authorization_url via google_auth_oauthlib.Flow
+ │ 4. HTTP 302 redirect → Google OAuth consent page
+ ▼
+Google OAuth Consent Page (browser)
+ │ User approves
+ │ Google redirects to:
+ ▼
+[GET /api/cloud/oauth/callback/google_drive?code=...&state=...]
+ │ 1. Validate state → lookup Redis oauth_state:{state} → get user_id
+ │ 2. Delete Redis key (prevent replay)
+ │ 3. Exchange code → tokens via flow.fetch_token()
+ │ 4. Serialize credentials (access_token, refresh_token, expiry)
+ │ 5. Encrypt with HKDF-derived per-user Fernet key
+ │ 6. Save/upsert cloud_connections row (user_id, provider, credentials_enc, status=ACTIVE)
+ │ 7. HTTP 302 redirect → Vue /settings?cloud_connected=google_drive
+ ▼
+Vue SettingsView (onMounted)
+ │ Reads ?cloud_connected=google_drive
+ │ Shows success toast
+ ▼
+[GET /api/cloud/connections]
+ │ Lists all cloud connections for current user
+ │ Returns CloudConnectionOut (no credentials_enc)
+ ▼
+Browser renders Cloud Storage tab with connection status badges
+
+─────── Document Upload to Cloud Folder ───────
+
+Browser (Vue 3)
+ │ User is viewing Google Drive folder node
+ │ Drops file
+ ▼
+[POST /api/documents/upload]
+ │ active folder context = cloud folder (provider=google_drive, folder_id=...)
+ │ 1. Load CloudConnection for user + provider
+ │ 2. Decrypt credentials_enc → Fernet key → credentials dict
+ │ 3. Check token expiry → if expired, refresh transparently (D-05)
+ │ 4. Call google_drive_backend.put_object(user_id, doc_id, bytes, ext, ct)
+ │ └── asyncio.to_thread → drive.files().create(...)
+ │ 5. Save Document(storage_backend="google_drive", object_key=drive_file_id)
+ ▼
+Browser shows upload progress (same UploadProgress component)
+
+─────── Document Download from Cloud ───────
+
+[GET /api/documents/{id}/content]
+ │ 1. Load Document → storage_backend = "google_drive"
+ │ 2. get_storage_backend("google_drive", user_id, session) → GoogleDriveBackend
+ │ 3. backend.get_object(object_key) → bytes
+ │ 4. StreamingResponse to browser
+ ▼
+Browser renders PDF in existing DocumentPreviewModal
+
+─────── WebDAV/Nextcloud Connection ───────
+
+Browser
+ │ User submits server_url + username + password (or app password)
+ ▼
+[POST /api/cloud/connections/webdav]
+ │ 1. validate_cloud_url(server_url) → SSRF check (ipaddress module)
+ │ 2. Test connection: PROPFIND server_url (lightweight)
+ │ 3. If success: encrypt credentials → save cloud_connections
+ │ 4. If fail: 422 with error message (D-08)
+ ▼
+Browser shows ACTIVE status badge
+```
+
+### Recommended Project Structure
+
+```
+backend/storage/
+├── base.py # existing StorageBackend ABC (7 abstract methods)
+├── __init__.py # extend get_storage_backend() factory
+├── minio_backend.py # existing reference implementation
+├── google_drive_backend.py # new: Google Drive v3
+├── onedrive_backend.py # new: Microsoft Graph / OneDrive
+├── nextcloud_backend.py # new: Nextcloud (WebDAV + status endpoint)
+├── webdav_backend.py # new: generic WebDAV
+└── cloud_utils.py # new: validate_cloud_url(), encrypt_credentials(), decrypt_credentials()
+
+backend/api/
+└── cloud.py # new: all /api/cloud/* endpoints
+
+backend/services/
+└── cloud_cache.py # new: TTLCache singleton for folder listings
+
+backend/tests/
+└── test_cloud.py # new: all Phase 5 tests
+```
+
+### Pattern 1: StorageBackend ABC Contract (7 methods)
+
+The existing ABC requires all 7 methods. Cloud backends raise `NotImplementedError` for `generate_presigned_put_url` per D-14:
+
+```python
+# Source: backend/storage/base.py (verified in codebase)
+class StorageBackend(ABC):
+ @abstractmethod
+ async def put_object(self, user_id, document_id, file_bytes, extension, content_type) -> str: ...
+ @abstractmethod
+ async def get_object(self, object_key: str) -> bytes: ...
+ @abstractmethod
+ async def delete_object(self, object_key: str) -> None: ...
+ @abstractmethod
+ async def presigned_get_url(self, object_key: str, expires_minutes: int = 60) -> str: ...
+ @abstractmethod
+ async def health_check(self) -> bool: ...
+ @abstractmethod
+ async def generate_presigned_put_url(self, object_key: str, expires_minutes: int = 15) -> str: ...
+ @abstractmethod
+ async def stat_object(self, object_key: str) -> int: ...
+```
+
+Cloud backends implement all 7. For `generate_presigned_put_url` and `presigned_get_url`, cloud backends raise `NotImplementedError` — the upload endpoint detects cloud backends and uses the direct path (D-14). For `stat_object`, cloud backends return file size from the provider's metadata response.
+
+The `object_key` for cloud backends is the **provider's native file ID** (e.g., Google Drive file ID, OneDrive item ID, WebDAV path). The STORE-02 key schema (`{user_id}/{document_id}/{uuid4()}{ext}`) applies only to MinIO.
+
+### Pattern 2: HKDF + Fernet Credential Encryption
+
+```python
+# Source: cryptography.io/en/latest/hazmat/primitives/key-derivation-functions/
+# [VERIFIED: CITED: cryptography.io]
+import base64
+from cryptography.hazmat.primitives import hashes
+from cryptography.hazmat.primitives.kdf.hkdf import HKDF
+from cryptography.fernet import Fernet
+
+def _derive_fernet_key(master_key: bytes, user_id: str) -> Fernet:
+ """Derive a per-user Fernet key using HKDF-SHA256.
+
+ master_key = CLOUD_CREDS_KEY env var as bytes
+ salt = user_id bytes (deterministic per user — we need same key on decrypt)
+ info = b"cloud-credentials" (domain separation)
+ """
+ hkdf = HKDF(
+ algorithm=hashes.SHA256(),
+ length=32,
+ salt=user_id.encode("utf-8"), # deterministic salt = user_id
+ info=b"cloud-credentials",
+ )
+ raw_key = hkdf.derive(master_key)
+ fernet_key = base64.urlsafe_b64encode(raw_key)
+ return Fernet(fernet_key)
+
+def encrypt_credentials(master_key: bytes, user_id: str, credentials: dict) -> str:
+ """Encrypt credentials dict to base64 Fernet token string."""
+ import json
+ f = _derive_fernet_key(master_key, user_id)
+ plaintext = json.dumps(credentials).encode("utf-8")
+ return f.encrypt(plaintext).decode("utf-8")
+
+def decrypt_credentials(master_key: bytes, user_id: str, credentials_enc: str) -> dict:
+ """Decrypt credentials_enc back to dict."""
+ import json
+ f = _derive_fernet_key(master_key, user_id)
+ plaintext = f.decrypt(credentials_enc.encode("utf-8"))
+ return json.loads(plaintext)
+```
+
+**Critical note:** HKDF is **not** reusable — a new `HKDF` instance must be created for each derivation call. The `cryptography` library raises `AlreadyFinalized` if `.derive()` is called twice on the same instance. The `_derive_fernet_key` function must create a fresh `HKDF` instance each call.
+
+### Pattern 3: Google Drive OAuth2 Flow via google-auth-oauthlib
+
+```python
+# Source: googleapis.dev/python/google-auth-oauthlib/latest (VERIFIED: official docs)
+from google_auth_oauthlib.flow import Flow
+
+# At initiation:
+flow = Flow.from_client_config(
+ {
+ "web": {
+ "client_id": settings.google_client_id,
+ "client_secret": settings.google_client_secret,
+ "auth_uri": "https://accounts.google.com/o/oauth2/auth",
+ "token_uri": "https://oauth2.googleapis.com/token",
+ }
+ },
+ scopes=["https://www.googleapis.com/auth/drive.file"],
+)
+flow.redirect_uri = f"{settings.backend_url}/api/cloud/oauth/callback/google_drive"
+authorization_url, state = flow.authorization_url(access_type="offline", prompt="consent")
+# Store state → Redis (key: oauth_state:{state}, value: user_id, TTL 30 min)
+# Redirect browser to authorization_url
+
+# At callback:
+# Restore flow from client config (stateless — recreate Flow on each callback)
+flow = Flow.from_client_config(client_config, scopes=[...], state=state)
+flow.redirect_uri = redirect_uri
+flow.fetch_token(code=code)
+creds = flow.credentials
+# creds.token = access token
+# creds.refresh_token = refresh token
+# creds.expiry = datetime
+```
+
+**`access_type="offline"` is required** to obtain a refresh token. Without it, Google only returns a short-lived access token. `prompt="consent"` forces re-consent on each connect, which ensures a fresh refresh token.
+
+### Pattern 4: OneDrive OAuth2 Flow via MSAL
+
+```python
+# Source: learn.microsoft.com/en-us/entra/msal/python/ [CITED]
+import msal
+
+# Confidential client app (has client_secret)
+app = msal.ConfidentialClientApplication(
+ client_id=settings.onedrive_client_id,
+ client_credential=settings.onedrive_client_secret,
+ authority=f"https://login.microsoftonline.com/{settings.onedrive_tenant_id}",
+)
+
+# At initiation:
+auth_url = app.get_authorization_request_url(
+ scopes=["Files.ReadWrite", "offline_access"],
+ redirect_uri=f"{settings.backend_url}/api/cloud/oauth/callback/onedrive",
+ state=state_token,
+)
+# Redirect browser to auth_url
+
+# At callback:
+result = app.acquire_token_by_authorization_code(
+ code=code,
+ scopes=["Files.ReadWrite", "offline_access"],
+ redirect_uri=redirect_uri,
+)
+# result["access_token"] — short-lived access token
+# result["refresh_token"] — long-lived refresh token
+# result["expires_in"] — seconds until access_token expires
+
+# Refresh on-demand (D-05):
+result = app.acquire_token_by_refresh_token(
+ refresh_token=stored_refresh_token,
+ scopes=["Files.ReadWrite", "offline_access"],
+)
+# If result.get("error") == "invalid_grant" → REQUIRES_REAUTH (D-06)
+```
+
+**`offline_access` scope is required** to obtain a refresh token from Microsoft identity platform. The `tenant_id` can be `"common"` for multi-tenant apps (personal OneDrive and organizational accounts). For personal OneDrive only, use `"consumers"`.
+
+### Pattern 5: WebDAV Operations via webdavclient3 + asyncio.to_thread
+
+```python
+# Source: pypi.org/project/webdavclient3 (VERIFIED: PyPI) [ASSUMED: specific API usage]
+import asyncio
+from webdav3.client import Client
+
+class WebDAVBackend(StorageBackend):
+ def __init__(self, server_url: str, username: str, password: str):
+ options = {
+ "webdav_hostname": server_url,
+ "webdav_login": username,
+ "webdav_password": password,
+ }
+ self._client = Client(options)
+ self._base_path = "docuvault/" # namespace prefix in WebDAV tree
+
+ async def put_object(self, user_id, document_id, file_bytes, extension, content_type) -> str:
+ # object_key = WebDAV path used as identifier
+ object_key = f"docuvault/{user_id}/{document_id}{extension}"
+ import io
+ buf = io.BytesIO(file_bytes)
+ await asyncio.to_thread(
+ self._client.upload_to, buf, object_key
+ )
+ return object_key
+
+ async def get_object(self, object_key: str) -> bytes:
+ import io
+ buf = io.BytesIO()
+ await asyncio.to_thread(self._client.download_from, buf, object_key)
+ return buf.getvalue()
+```
+
+Note: `webdavclient3` is synchronous. All calls MUST be wrapped in `asyncio.to_thread()` — same pattern as `MinIOBackend`. [ASSUMED: `upload_to`/`download_from` method names — verify against installed package docs]
+
+### Pattern 6: SSRF Prevention via ipaddress Module
+
+```python
+# Source: python.org/library/ipaddress [VERIFIED: Python stdlib]
+import ipaddress
+import socket
+from urllib.parse import urlparse
+
+BLOCKED_NETS = [
+ ipaddress.ip_network("127.0.0.0/8"), # loopback
+ ipaddress.ip_network("169.254.0.0/16"), # link-local
+ ipaddress.ip_network("10.0.0.0/8"), # RFC 1918
+ ipaddress.ip_network("172.16.0.0/12"), # RFC 1918
+ ipaddress.ip_network("192.168.0.0/16"), # RFC 1918
+ ipaddress.ip_network("::1/128"), # IPv6 loopback
+ ipaddress.ip_network("fc00::/7"), # IPv6 ULA
+]
+
+def validate_cloud_url(url: str) -> None:
+ """Raise ValueError if url targets a private/internal address.
+
+ Called at connect-time and before every WebDAV/Nextcloud request.
+ D-17: blocks localhost, 127.x, 169.254.x, RFC 1918 ranges, ::1.
+ """
+ parsed = urlparse(url)
+ if parsed.scheme not in ("http", "https"):
+ raise ValueError(f"Unsupported scheme: {parsed.scheme}")
+ hostname = parsed.hostname
+ if not hostname:
+ raise ValueError("URL has no hostname")
+ # Resolve hostname to IP
+ try:
+ addr = ipaddress.ip_address(hostname)
+ except ValueError:
+ # Not a raw IP — resolve via DNS
+ try:
+ resolved = socket.getaddrinfo(hostname, None)[0][4][0]
+ addr = ipaddress.ip_address(resolved)
+ except (socket.gaierror, ValueError) as exc:
+ raise ValueError(f"Cannot resolve hostname: {exc}") from exc
+
+ for net in BLOCKED_NETS:
+ if addr in net:
+ raise ValueError(f"URL targets a private/internal address: {addr}")
+```
+
+**Security note:** DNS-based SSRF bypass is a known attack vector — an attacker registers a DNS name that resolves to an internal IP. The `validate_cloud_url` function must resolve DNS and check the resolved IP, not just the hostname string. This pattern is the OWASP-recommended approach. [CITED: cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html]
+
+### Pattern 7: OAuth State Storage via Redis
+
+```python
+# Source: established pattern from backend/api/auth.py (VERIFIED: codebase)
+# Redis is already on app.state.redis (aioredis client)
+
+# At OAuth initiation:
+state_token = secrets.token_urlsafe(32)
+redis_key = f"oauth_state:{state_token}"
+await request.app.state.redis.setex(
+ redis_key,
+ 1800, # 30-minute TTL — long enough for user to complete OAuth consent
+ str(current_user.id),
+)
+# Return redirect to authorization_url with state=state_token
+
+# At OAuth callback:
+redis_key = f"oauth_state:{state}"
+user_id_bytes = await request.app.state.redis.get(redis_key)
+if not user_id_bytes:
+ raise HTTPException(400, "Invalid or expired OAuth state")
+await request.app.state.redis.delete(redis_key) # single-use
+user_id = uuid.UUID(user_id_bytes.decode())
+```
+
+This follows the exact same pattern as TOTP replay prevention in `auth.py` — Redis TTL key, single-use deletion after validation.
+
+### Pattern 8: TTLCache for Folder Listings (cachetools)
+
+```python
+# Source: cachetools.readthedocs.io [CITED]
+import threading
+from cachetools import TTLCache
+
+# In FastAPI lifespan or module-level singleton
+# maxsize=1000: enough for ~50 users × 20 folder nodes each
+# ttl=60: 60-second cache per D-16
+_folder_cache: TTLCache = TTLCache(maxsize=1000, ttl=60)
+_folder_cache_lock = threading.Lock()
+
+async def get_cloud_folders_cached(user_id: str, provider: str, folder_id: str, fetch_fn) -> list:
+ """Return cached result or call fetch_fn and cache it."""
+ cache_key = f"{user_id}:{provider}:{folder_id}"
+ with _folder_cache_lock:
+ if cache_key in _folder_cache:
+ return _folder_cache[cache_key]
+
+ result = await fetch_fn() # async — outside the lock
+
+ with _folder_cache_lock:
+ _folder_cache[cache_key] = result
+ return result
+```
+
+**Thread safety:** `cachetools.TTLCache` is not thread-safe by itself. A `threading.Lock` is required for concurrent access. The fetch function itself is async and must be called outside the lock to avoid blocking the event loop. [CITED: cachetools.readthedocs.io — "access to a shared cache from multiple threads must be properly synchronized"]
+
+### Pattern 9: Factory Extension (get_storage_backend)
+
+```python
+# Source: backend/storage/__init__.py (VERIFIED: codebase)
+# Current factory only returns MinIOBackend. Phase 5 extends it:
+
+async def get_storage_backend_for_document(
+ document: Document,
+ user: User,
+ session: AsyncSession,
+) -> StorageBackend:
+ """Return the correct StorageBackend for the given document.
+
+ MinIO documents (storage_backend='minio'): return shared MinIOBackend.
+ Cloud documents: load CloudConnection, decrypt credentials, return backend instance.
+ """
+ if document.storage_backend == "minio":
+ return get_storage_backend() # existing factory
+
+ # Load cloud connection
+ result = await session.execute(
+ select(CloudConnection).where(
+ CloudConnection.user_id == user.id,
+ CloudConnection.provider == document.storage_backend,
+ CloudConnection.status == "ACTIVE",
+ )
+ )
+ conn = result.scalar_one_or_none()
+ if conn is None:
+ raise HTTPException(503, "Cloud connection not found or inactive")
+
+ master_key = settings.cloud_creds_key.encode()
+ credentials = decrypt_credentials(master_key, str(user.id), conn.credentials_enc)
+
+ if document.storage_backend == "google_drive":
+ return GoogleDriveBackend(credentials)
+ elif document.storage_backend == "onedrive":
+ return OneDriveBackend(credentials)
+ elif document.storage_backend in ("nextcloud", "webdav"):
+ return WebDAVBackend(credentials["server_url"], credentials["username"], credentials["password"])
+ else:
+ raise ValueError(f"Unknown storage backend: {document.storage_backend}")
+```
+
+### Pattern 10: On-Demand Token Refresh (D-05)
+
+```python
+# Source: D-05 decision (CONTEXT.md) [ASSUMED: exact error class names]
+class GoogleDriveBackend(StorageBackend):
+ async def _call_with_refresh(self, operation_fn, credentials: dict, user_id: str, conn: CloudConnection, session):
+ """Attempt operation; on 401, refresh tokens and retry once."""
+ try:
+ return await operation_fn(credentials)
+ except Exception as e:
+ # Google Drive: googleapiclient.errors.HttpError with status 401
+ if _is_token_expired_error(e):
+ new_creds = await self._refresh_token(credentials)
+ if new_creds is None:
+ # invalid_grant — set REQUIRES_REAUTH (D-06)
+ conn.status = "REQUIRES_REAUTH"
+ await session.commit()
+ raise CloudConnectionError("Cloud connection requires re-authentication")
+ # Update credentials_enc
+ master_key = settings.cloud_creds_key.encode()
+ conn.credentials_enc = encrypt_credentials(master_key, user_id, new_creds)
+ conn.status = "ACTIVE"
+ await session.commit()
+ return await operation_fn(new_creds)
+ raise
+```
+
+### Anti-Patterns to Avoid
+
+- **Storing OAuth state in FastAPI process memory:** Multi-instance deployments will fail because the callback may arrive at a different instance than the one that created the state. Use Redis.
+- **Reusing the HKDF instance:** The `cryptography` library raises `AlreadyFinalized` on second call to `.derive()`. Always create a new `HKDF` instance per key derivation.
+- **Checking hostname string for SSRF, not resolved IP:** `validate_cloud_url("http://internal.corp")` would pass a string check but may resolve to `10.0.0.1`. Always resolve DNS and check the resulting IP.
+- **Returning `credentials_enc` in any API response:** The `CloudConnectionOut` Pydantic model (already in `admin.py`) is the whitelist — use it for all cloud connection responses.
+- **Calling cloud SDK methods from the async event loop without `asyncio.to_thread()`:** All cloud SDKs (`google-api-python-client`, `msal`, `webdavclient3`) are synchronous. Blocking the event loop kills throughput.
+- **Using `prompt="consent"` only on first connect:** Without `prompt="consent"`, Google may not return a refresh token on reconnect if the app was previously authorized. Always pass `prompt="consent"` to guarantee a fresh refresh token.
+- **Single cloud_connections row per user:** The schema supports multiple providers simultaneously (one row per provider per user, D-13). The upsert logic must match on `(user_id, provider)` not just `user_id`.
+
+---
+
+## Don't Hand-Roll
+
+| Problem | Don't Build | Use Instead | Why |
+|---------|-------------|-------------|-----|
+| OAuth2 PKCE + token exchange for Google | Custom HMAC/base64 code verifier | `google_auth_oauthlib.flow.Flow` | Handles RFC 7636 PKCE, redirect URI validation, and token serialization |
+| OAuth2 for Microsoft Graph | Raw `requests` calls to login.microsoftonline.com | `msal.ConfidentialClientApplication` | MSAL handles token cache, `invalid_grant` detection, tenant routing, and PKCE |
+| WebDAV PROPFIND XML | Raw `httpx` with hand-coded XML bodies | `webdavclient3.Client` | PROPFIND response parsing, multistatus handling, redirect following |
+| Fernet encryption + key derivation | AES-GCM + custom key stretching | `cryptography` Fernet + HKDF | Fernet is misuse-resistant (authenticated encryption with IV, HMAC tag) — hand-rolled AES can fail silently |
+| Private IP detection for SSRF | Regex on URL string | `ipaddress.ip_network().supernet_of()` | Python's `ipaddress` module handles IPv4/IPv6 edge cases including `::ffff:127.0.0.1` mapped addresses |
+| In-memory TTL cache | `dict` with `asyncio.get_event_loop().time()` comparison | `cachetools.TTLCache` | TTLCache handles concurrent access with a lock, LRU eviction, and correct TTL semantics |
+| OAuth state token validation | JWT with custom HMAC | Redis TTL key | Redis TTL provides natural expiry + single-use deletion; no new secret required |
+
+**Key insight:** All cloud credential handling is a solved problem at the library level. The most common Phase 5 failure mode would be attempting to re-implement OAuth token exchange logic that edge cases around redirect URI matching, PKCE, and token format silently break.
+
+---
+
+## Common Pitfalls
+
+### Pitfall 1: Google Refresh Token Only Issued Once
+**What goes wrong:** User connects Google Drive; the first connection includes a refresh token. Later the user disconnects and reconnects. Google does not issue a new refresh token because the user already authorized the app — the re-authorization returns only an access token. Credentials are stored but the connection goes stale in 1 hour.
+**Why it happens:** Google only issues a refresh token on the first authorization for a given client_id + user pair, or when `prompt="consent"` is explicitly passed.
+**How to avoid:** Always pass `prompt="consent"` and `access_type="offline"` in `flow.authorization_url()`.
+**Warning signs:** `credentials.refresh_token` is `None` after `flow.fetch_token()`.
+
+### Pitfall 2: webdavclient3 Path Encoding for Nextcloud
+**What goes wrong:** Nextcloud returns 404 or 207 Multi-Status with an empty propfind result for paths with spaces or non-ASCII characters when the path is not percent-encoded.
+**Why it happens:** Nextcloud's WebDAV endpoint requires percent-encoded paths; webdavclient3 may or may not encode paths depending on the method called.
+**How to avoid:** Use `urllib.parse.quote()` on all path segments before passing to webdavclient3 operations that accept raw paths. [ASSUMED — verify against webdavclient3 docs during implementation]
+**Warning signs:** Works with ASCII-only filenames; fails with spaces or umlauts.
+
+### Pitfall 3: HKDF AlreadyFinalized Error
+**What goes wrong:** `cryptography.exceptions.AlreadyFinalized` is raised when `HKDF.derive()` is called a second time on the same instance.
+**Why it happens:** HKDF is a one-shot operation by design in the `cryptography` library.
+**How to avoid:** Create a new `HKDF(...)` instance inside `_derive_fernet_key()` on every call — never store or reuse the HKDF instance.
+**Warning signs:** Works in unit tests (each test creates a fresh instance), fails under concurrent load or in repeated calls within the same request.
+
+### Pitfall 4: OAuth Callback State Mismatch in Multi-Instance Deployment
+**What goes wrong:** State token is stored in a Python dict in-process. The OAuth callback arrives at a different uvicorn instance → `invalid state` error.
+**Why it happens:** HTTP requests are not session-sticky in a load-balanced deployment.
+**How to avoid:** Store OAuth state in Redis (`app.state.redis`) with a 30-minute TTL. [VERIFIED: Redis already wired in codebase at `app.state.redis`]
+**Warning signs:** OAuth works in single-instance Docker Compose but fails intermittently in production.
+
+### Pitfall 5: DNS Rebinding Attack on SSRF Validation
+**What goes wrong:** `validate_cloud_url` resolves `attacker.com` to `8.8.8.8` (passes validation), then the subsequent request resolves `attacker.com` to `169.254.169.254` (cloud metadata endpoint). The validation and the actual request see different IPs.
+**Why it happens:** DNS TTL expires between validation and request; attacker controls the DNS.
+**How to avoid:** Use `socket.create_connection` with the pre-validated IP directly (pin the IP), or document that a network-level egress firewall is the defense-in-depth layer for DNS rebinding. The `validate_cloud_url` utility call immediately before each request (not once at connect time) reduces the window. [CITED: cheatsheetseries.owasp.org]
+**Warning signs:** SSRF test passes with direct IP inputs but might miss DNS-based attacks.
+
+### Pitfall 6: Microsoft Graph Upload Size Limit
+**What goes wrong:** Files larger than 4 MB fail with `413 Request Entity Too Large` when uploaded via a single PUT/POST to Microsoft Graph.
+**Why it happens:** Microsoft Graph's simple upload endpoint is limited to 4 MB. Larger files require a resumable upload session (`createUploadSession`).
+**How to avoid:** For Phase 5, implement resumable upload sessions for files > 4 MB. Use `POST /me/drive/root:/{path}:/createUploadSession` to get an upload URL, then upload in 10 MB chunks.
+**Warning signs:** Tests with small files pass; production uploads of real documents (> 4 MB) fail silently or with 413.
+
+### Pitfall 7: Google Drive file() Service is Synchronous
+**What goes wrong:** `googleapiclient.discovery.build()` and all `service.files().xxx().execute()` calls are synchronous and block the event loop.
+**Why it happens:** `google-api-python-client` was built before asyncio was standard.
+**How to avoid:** Wrap every SDK call in `asyncio.to_thread()`. Do NOT await `service.files().list()` directly — it is not a coroutine.
+**Warning signs:** FastAPI request handler completes quickly in tests but blocks under load.
+
+---
+
+## Code Examples
+
+### Credential Round-Trip Test (CLOUD-02)
+
+```python
+# Source: based on cryptography.io HKDF docs [CITED: cryptography.io]
+import base64
+import json
+from cryptography.hazmat.primitives import hashes
+from cryptography.hazmat.primitives.kdf.hkdf import HKDF
+from cryptography.fernet import Fernet
+
+def test_credential_encryption_round_trip():
+ master_key = b"test-master-key-32bytes-padded!!" # 32 bytes
+ user_id = "550e8400-e29b-41d4-a716-446655440000"
+ credentials = {"access_token": "ya29.xxx", "refresh_token": "1//xxx", "expiry": "2026-05-28T15:00:00"}
+
+ encrypted = encrypt_credentials(master_key, user_id, credentials)
+ assert isinstance(encrypted, str)
+ assert "access_token" not in encrypted # not plaintext
+
+ decrypted = decrypt_credentials(master_key, user_id, credentials)
+ assert decrypted == credentials
+```
+
+### SSRF Validation Test
+
+```python
+# Source: pattern derived from OWASP SSRF cheat sheet [CITED: cheatsheetseries.owasp.org]
+import pytest
+
+@pytest.mark.parametrize("url,should_raise", [
+ ("http://localhost/dav", True),
+ ("http://127.0.0.1/dav", True),
+ ("http://169.254.169.254/dav", True),
+ ("http://10.0.0.1/dav", True),
+ ("http://192.168.1.1/dav", True),
+ ("http://172.16.0.1/dav", True),
+ ("https://nextcloud.example.com/remote.php/dav", False),
+ ("http://::1/dav", True),
+])
+def test_ssrf_validation(url, should_raise):
+ if should_raise:
+ with pytest.raises(ValueError):
+ validate_cloud_url(url)
+ else:
+ validate_cloud_url(url) # no exception
+```
+
+### CloudConnectionOut Whitelist Enforcement
+
+```python
+# Source: backend/api/admin.py (VERIFIED: codebase)
+# The CloudConnectionOut model already exists in admin.py.
+# ALL cloud connection endpoints must use this model, not CloudConnection ORM directly.
+class CloudConnectionOut(BaseModel):
+ id: str
+ provider: str
+ display_name: str
+ status: str
+ connected_at: datetime
+ model_config = {"from_attributes": True}
+
+# Usage in cloud.py:
+@router.get("/api/cloud/connections")
+async def list_connections(
+ current_user: User = Depends(get_regular_user),
+ session: AsyncSession = Depends(get_db),
+) -> dict:
+ result = await session.execute(
+ select(CloudConnection).where(CloudConnection.user_id == current_user.id)
+ )
+ connections = result.scalars().all()
+ return {"items": [CloudConnectionOut.model_validate(c).model_dump() for c in connections]}
+```
+
+---
+
+## State of the Art
+
+| Old Approach | Current Approach | When Changed | Impact |
+|--------------|------------------|--------------|--------|
+| Storing OAuth state in Flask/FastAPI session (in-memory) | Redis TTL-keyed state tokens | ~2022 with multi-instance deployments becoming standard | Multi-instance safety; prevents token fixation |
+| webdav-client-python (original) | webdavclient3 (fork, actively maintained) | 2018 | webdav-client-python is unmaintained; webdavclient3 is the maintained fork |
+| `google.oauth2.credentials.Credentials` with service accounts | `google-auth-oauthlib` Flow for user-delegated access | 2019 | Service accounts require GSuite domain; user OAuth is required for personal Drive |
+| ADAL (Azure Active Directory Authentication Library) for Python | MSAL (Microsoft Authentication Library) | 2020; ADAL deprecated | ADAL end-of-life June 2023; MSAL is the replacement |
+| Using `Fernet.generate_key()` with user passwords | HKDF + Fernet (key derivation before Fernet) | Ongoing best practice | Fernet keys must be 32 random bytes; `generate_key()` generates fresh random keys, not deterministic per-user keys |
+
+**Deprecated/outdated:**
+- `adal` Python package: End-of-life; replaced by `msal`. Do NOT use.
+- `webdav-client-python` (without the `3`): Unmaintained since ~2018. Use `webdavclient3`.
+- `google.oauth2.service_account.Credentials`: For service accounts, not user-delegated Drive access. Wrong tool for this use case.
+
+---
+
+## Assumptions Log
+
+| # | Claim | Section | Risk if Wrong |
+|---|-------|---------|---------------|
+| A1 | `webdavclient3` uses `upload_to` / `download_from` method names for stream-based operations | Architecture Patterns Pattern 5 | Planner must verify method signatures against installed package; wrong method names cause `AttributeError` at test time |
+| A2 | Google Drive `googleapiclient.errors.HttpError` status 401 is the token-expiry signal | Pattern 10: On-Demand Token Refresh | Actual exception class may differ; must verify during implementation with a real expired token |
+| A3 | Microsoft Graph `invalid_grant` error appears in `result["error"]` from `msal.acquire_token_by_refresh_token` | Pattern 10 | MSAL may use a different error field or raise an exception; verify against msal docs |
+| A4 | `webdavclient3` percent-encodes paths automatically | Pitfall 2 | May require manual encoding; verify during WebDAV backend implementation |
+| A5 | `tenant_id="common"` works for both personal OneDrive and organizational accounts | Pattern 4: MSAL | May require `"consumers"` for personal accounts; verify against Microsoft docs for the target use case |
+
+---
+
+## Open Questions
+
+1. **Google Drive object key scheme for `stat_object`**
+ - What we know: MinIO `stat_object` returns size in bytes from the storage layer. Google Drive returns file metadata including `size` from `files.get(fileId, fields='size')`.
+ - What's unclear: Google Drive may not return `size` for Google Workspace files (Docs, Sheets, Slides) since they have no binary size. DocuVault uploads binary files, so this may not be an issue in practice.
+ - Recommendation: Implement `stat_object` using `service.files().get(fileId=object_key, fields="size").execute()` and return `int(metadata["size"])`. Add a fallback of `0` for files without a size.
+
+2. **Nextcloud folder listing path convention**
+ - What we know: Nextcloud WebDAV base path is typically `/remote.php/dav/files/{username}/`.
+ - What's unclear: Whether the `webdavclient3` `Client` automatically handles the `/remote.php/dav/files/{username}/` prefix or whether it must be included in the `server_url`.
+ - Recommendation: Store `server_url` as the full WebDAV root (e.g., `https://nc.example.com/remote.php/dav/files/alice/`) and use relative paths within it. Test with PROPFIND on the root to validate the connection (D-08).
+
+3. **Microsoft Graph upload for files > 4 MB**
+ - What we know: Simple upload (PUT `/me/drive/root:/{path}:/content`) is limited to 4 MB. Resumable sessions handle larger files.
+ - What's unclear: The Phase 5 plan should specify whether to implement resumable sessions upfront or use a 4 MB size gate.
+ - Recommendation: Implement resumable upload session (`createUploadSession`) for all files to avoid the hard limit. It handles both small and large files without a size check.
+
+---
+
+## Environment Availability
+
+| Dependency | Required By | Available | Version | Fallback |
+|------------|------------|-----------|---------|----------|
+| Python 3.12 (Docker) | All backends | In Docker container | 3.12.x | — |
+| Redis | OAuth state storage | In Docker Compose | 6.x+ | — |
+| PostgreSQL | cloud_connections table | In Docker Compose | 15.x | — |
+| `cryptography` package | Credential encryption | NOT in requirements.txt | — | Must be added (48.0.0 verified) |
+| `google-auth-oauthlib` | Google Drive OAuth | NOT in requirements.txt | — | Must be added (1.3.1 verified) |
+| `google-api-python-client` | Google Drive API | NOT in requirements.txt | — | Must be added (2.196.0 verified) |
+| `msal` | OneDrive OAuth | NOT in requirements.txt | — | Must be added (1.36.0 verified) |
+| `webdavclient3` | WebDAV/Nextcloud | NOT in requirements.txt | — | Must be added (3.14.7 verified) |
+| `cachetools` | Folder listing cache | NOT in requirements.txt | — | Must be added (6.2.6 verified) |
+| Google OAuth App (Azure/GCP console) | Google Drive integration | NOT CONFIGURED | — | Must be created by user; client_id/client_secret added to .env |
+| Microsoft App Registration (Azure portal) | OneDrive integration | NOT CONFIGURED | — | Must be created by user; client_id/client_secret/tenant_id added to .env |
+
+**Missing dependencies with no fallback:**
+- `cryptography`, `google-auth-oauthlib`, `google-api-python-client`, `msal`, `webdavclient3`, `cachetools` — must be added to `requirements.txt` before any cloud backend code runs.
+
+**Missing dependencies with fallback (soft):**
+- Google OAuth App credentials: Integration tests for Google Drive will need mocked OAuth flows if real GCP app is not configured. Unit tests can mock the entire OAuth flow.
+- Microsoft App Registration: Same as above for OneDrive.
+
+---
+
+## Validation Architecture
+
+### Test Framework
+
+| Property | Value |
+|----------|-------|
+| Framework | pytest + pytest-asyncio (already in requirements.txt) |
+| Config file | `backend/pytest.ini` (already exists) |
+| Quick run command | `cd backend && pytest tests/test_cloud.py -x -v` |
+| Full suite command | `cd backend && pytest -v` |
+
+### Phase Requirements → Test Map
+
+| Req ID | Behavior | Test Type | Automated Command | File Exists? |
+|--------|----------|-----------|-------------------|-------------|
+| CLOUD-01 | User can connect all 4 providers | Integration | `pytest tests/test_cloud.py::test_connect_google_drive -x` | ❌ Wave 0 |
+| CLOUD-01 | OAuth callback validates state and saves connection | Integration | `pytest tests/test_cloud.py::test_oauth_callback_valid_state -x` | ❌ Wave 0 |
+| CLOUD-01 | Invalid OAuth state returns 400 | Integration | `pytest tests/test_cloud.py::test_oauth_callback_invalid_state -x` | ❌ Wave 0 |
+| CLOUD-01 | WebDAV/Nextcloud connection validated before save (D-08) | Integration | `pytest tests/test_cloud.py::test_webdav_connect_validates -x` | ❌ Wave 0 |
+| CLOUD-02 | Credential encryption/decryption round-trip | Unit | `pytest tests/test_cloud.py::test_credential_round_trip -x` | ❌ Wave 0 |
+| CLOUD-02 | `credentials_enc` not in any API response (SEC-08) | Integration | `pytest tests/test_cloud.py::test_credentials_enc_not_exposed -x` | ❌ Wave 0 |
+| CLOUD-03 | Upload to cloud folder goes through FastAPI (not presigned URL) | Integration | `pytest tests/test_cloud.py::test_cloud_upload_no_presigned -x` | ❌ Wave 0 |
+| CLOUD-04 | Connection status displayed correctly | Integration | `pytest tests/test_cloud.py::test_connection_status_display -x` | ❌ Wave 0 |
+| CLOUD-05 | `invalid_grant` → `REQUIRES_REAUTH` transition | Integration | `pytest tests/test_cloud.py::test_invalid_grant_sets_requires_reauth -x` | ❌ Wave 0 |
+| CLOUD-06 | Disconnect permanently deletes credentials | Integration | `pytest tests/test_cloud.py::test_disconnect_deletes_credentials -x` | ❌ Wave 0 |
+| CLOUD-07 | StorageBackend factory returns correct type | Unit | `pytest tests/test_cloud.py::test_factory_returns_correct_backend -x` | ❌ Wave 0 |
+| D-17 | SSRF validation blocks RFC-1918 and loopback | Unit | `pytest tests/test_cloud.py::test_ssrf_validation -x` | ❌ Wave 0 |
+| D-17 | SSRF validation blocks 169.254.x link-local | Unit | `pytest tests/test_cloud.py::test_ssrf_link_local -x` | ❌ Wave 0 |
+| SEC | Admin cannot access cloud connection credentials | Integration | `pytest tests/test_cloud.py::test_admin_cannot_see_credentials -x` | ❌ Wave 0 |
+| SEC | Cross-user cloud connection access returns 404 | Integration | `pytest tests/test_cloud.py::test_cross_user_idor -x` | ❌ Wave 0 |
+
+### Sampling Rate
+
+- **Per task commit:** `cd backend && pytest tests/test_cloud.py -x -v`
+- **Per wave merge:** `cd backend && pytest -v`
+- **Phase gate:** Full suite green before `/gsd:verify-work`
+
+### Wave 0 Gaps
+
+- [ ] `backend/tests/test_cloud.py` — all Phase 5 tests (unit + integration), starting with xfail stubs
+- [ ] New conftest fixtures: `mock_google_drive_creds`, `mock_onedrive_creds`, `mock_webdav_client`, `cloud_connection_factory`
+
+---
+
+## Security Domain
+
+### Applicable ASVS Categories
+
+| ASVS Category | Applies | Standard Control |
+|---------------|---------|-----------------|
+| V2 Authentication | yes | OAuth2 state CSRF; per-session token; `get_regular_user` dep on all cloud endpoints |
+| V3 Session Management | yes | OAuth state token is single-use; stored in Redis with TTL; deleted after callback |
+| V4 Access Control | yes | Every `/api/cloud/*` endpoint asserts `connection.user_id == current_user.id` before operations |
+| V5 Input Validation | yes | `validate_cloud_url()` for WebDAV/Nextcloud; Pydantic models for all request bodies; no raw string interpolation in URLs |
+| V6 Cryptography | yes | HKDF + Fernet for credential encryption; AES-256 via `cryptography` library (never hand-rolled) |
+| V7 Error Handling | yes | `invalid_grant` handled explicitly (D-06); no stack traces in cloud API error responses |
+
+### Known Threat Patterns for OAuth + Cloud Storage
+
+| Pattern | STRIDE | Standard Mitigation |
+|---------|--------|---------------------|
+| CSRF on OAuth callback | Tampering | `state` parameter validated via Redis; state token is `secrets.token_urlsafe(32)` |
+| SSRF via WebDAV/Nextcloud URL | Tampering / Information Disclosure | `validate_cloud_url()` at connect-time and before each request; `ipaddress` module DNS resolution check |
+| Credential exposure via API leak | Information Disclosure | `CloudConnectionOut` Pydantic whitelist; `credentials_enc` excluded by omission |
+| Token replay via OAuth state | Elevation of Privilege | Redis single-use deletion after callback; 30-minute TTL prevents stale states |
+| Cross-user cloud connection access | IDOR | `connection.user_id == current_user.id` assertion on every operation; 404 not 403 |
+| Unverified credentials stored (D-08) | Information Disclosure / DoS | PROPFIND/OPTIONS validation before storage; error returned on failure |
+| Refresh token theft from DB | Information Disclosure | `credentials_enc` is Fernet-encrypted with HKDF per-user key; master key in env var only |
+| Admin accessing user cloud credentials | Broken Access Control | `get_regular_user` dep blocks admin (403); `CloudConnectionOut` whitelist on all responses |
+| DNS rebinding SSRF bypass | Tampering | `validate_cloud_url()` called immediately before each outbound request (not only at connect-time); documented defense-in-depth via network egress firewall |
+
+---
+
+## Project Constraints (from CLAUDE.md)
+
+The following CLAUDE.md directives are binding for Phase 5:
+
+- JWT access token lives in Pinia memory only — never localStorage or sessionStorage (OAuth callback must redirect to Vue with a query param, not embed tokens in the URL)
+- Cloud credentials encrypted with HKDF per-user key derivation — master key in env var only
+- Admin endpoints never return `credentials_enc`
+- Every cloud connection endpoint asserts `resource.user_id == current_user.id`
+- All DB queries via ORM / parameterized statements — zero raw string interpolation
+- `get_regular_user` on all cloud connection endpoints (admin blocked from this surface)
+- `write_audit_log()` called on cloud connect, disconnect, and re-auth events
+- Testing protocol: every new function, endpoint, and component must have at least one test; `pytest -v` must pass zero failures
+- Security gate: `bandit -r backend/`, `pip audit`, `npm audit --audit-level=high` must all pass before phase advancement
+- Bug fix rule: root cause only, ≤50 lines, regression test required
+
+---
+
+## Sources
+
+### Primary (HIGH confidence)
+
+- `backend/storage/base.py` — StorageBackend ABC, 7 abstract methods, exact signatures
+- `backend/storage/minio_backend.py` — asyncio.to_thread() wrapping pattern, error handling shape
+- `backend/storage/__init__.py` — factory pattern to extend
+- `backend/db/models.py` — CloudConnection model fields, Document.storage_backend, User.default_storage_backend
+- `backend/api/admin.py` — CloudConnectionOut Pydantic whitelist pattern (already exists)
+- `backend/main.py` — Redis wiring on app.state.redis, lifespan pattern
+- `backend/deps/auth.py` — get_regular_user, get_current_user patterns
+- `backend/migrations/versions/0001_initial_schema.py` — confirmed cloud_connections table, storage_backend columns
+- [cryptography.io/en/latest/hazmat/primitives/key-derivation-functions/](https://cryptography.io/en/latest/hazmat/primitives/key-derivation-functions/) — HKDF usage and info parameter
+- [cryptography.io/en/latest/fernet/](https://cryptography.io/en/latest/fernet/) — Fernet key format
+- [googleapis.dev/python/google-auth-oauthlib/latest](https://googleapis.dev/python/google-auth-oauthlib/latest/reference/google_auth_oauthlib.flow.html) — Flow class API
+- PyPI `pip download` — confirmed versions: cryptography-48.0.0, google_auth_oauthlib-1.3.1, google_api_python_client-2.196.0, msal-1.36.0, webdavclient3-3.14.7, cachetools-6.2.6
+- slopcheck 0.6.1 — all 7 packages rated [OK]
+
+### Secondary (MEDIUM confidence)
+
+- [learn.microsoft.com/en-us/entra/msal/python/](https://learn.microsoft.com/en-us/entra/msal/python/) — MSAL Python overview and authorization code flow
+- [cachetools.readthedocs.io](https://cachetools.readthedocs.io/en/stable/) — TTLCache thread safety requirement
+- [cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html](https://cheatsheetseries.owasp.org/cheatsheets/Server_Side_Request_Forgery_Prevention_Cheat_Sheet.html) — DNS resolution-based SSRF check
+
+### Tertiary (LOW confidence / ASSUMED)
+
+- webdavclient3 specific method names (`upload_to`, `download_from`) — marked [ASSUMED] above; verify during implementation
+- Exact Microsoft Graph error field for `invalid_grant` in MSAL — marked [ASSUMED] above
+
+---
+
+## Metadata
+
+**Confidence breakdown:**
+- Standard stack: HIGH — all packages verified on PyPI, slopcheck clean, versions confirmed
+- Architecture: HIGH — built directly from codebase inspection; ABC, factory, CloudConnection model, Redis wiring all verified
+- OAuth2 flows: MEDIUM/HIGH — google-auth-oauthlib Flow API verified via official docs; MSAL pattern confirmed via Microsoft docs
+- Pitfalls: HIGH — based on official library docs and known OAuth edge cases
+- SSRF prevention: HIGH — Python stdlib ipaddress module; OWASP-cited approach
+
+**Research date:** 2026-05-28
+**Valid until:** 2026-06-28 (30 days) — package versions are stable but verify before pinning in requirements.txt