docs(05): add code review and verification reports for phase 5

REVIEW.md: 3 critical findings fixed (HTTPException passthrough,
Redis pre-flight ordering, CLOUD_CREDS_KEY in celery-worker env)
VERIFICATION.md: 7/7 must-haves verified; 6 human-verification items
require live cloud credentials (Google Drive, OneDrive, Nextcloud/WebDAV)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
curo1305
2026-05-30 18:07:42 +02:00
parent b1a136b5be
commit 1a6fa08a34
2 changed files with 407 additions and 228 deletions
@@ -2,258 +2,273 @@
phase: 05-cloud-storage-backends
reviewed: 2026-05-30T00:00:00Z
depth: standard
files_reviewed: 14
files_reviewed: 6
files_reviewed_list:
- backend/api/documents.py
- backend/api/admin.py
- backend/api/cloud.py
- backend/tasks/document_tasks.py
- backend/api/documents.py
- docker-compose.yml
- frontend/src/views/CloudStorageView.vue
- backend/tests/test_cloud.py
- backend/tests/test_admin_api.py
- backend/tests/test_classifier.py
- frontend/src/api/client.js
- frontend/src/components/admin/AdminUsersTab.vue
- frontend/src/components/cloud/CloudCredentialModal.vue
- frontend/src/components/documents/DocumentPreviewModal.vue
- frontend/src/components/settings/SettingsCloudTab.vue
- frontend/src/components/ui/ConfirmBlock.vue
- frontend/src/views/DocumentView.vue
- backend/tests/test_documents.py
findings:
critical: 5
warning: 6
info: 3
total: 14
critical: 3
warning: 4
info: 2
total: 9
status: issues_found
---
# Phase 05: Code Review Report
# Phase 05 Plan 12: Code Review Report
**Reviewed:** 2026-05-30
**Reviewed:** 2026-05-30T00:00:00Z
**Depth:** standard
**Files Reviewed:** 14
**Files Reviewed:** 6
**Status:** issues_found
## Summary
This review covers the gap-closure plans 05-09, 05-10, and 05-11. The changes add a `PATCH /api/documents/{id}` endpoint for filename/folder rename, make the Celery re-analyze task cloud-aware, replace unauthenticated iframe src with a fetch+Blob URL flow, change `oauth_initiate` to return JSON instead of a 302 redirect, add WebDAV/Nextcloud edit support, add an admin user hard-delete with password confirmation, and small UI fixes (ConfirmBlock break-words, Edit button on ERROR-state connections).
This review covers the plan 05-12 gap-closure changes: OAuth pre-flight config validation added
to `oauth_initiate`, a broad `except Exception → 502` fallback added after the
`except CloudConnectionError → 503` clause in `stream_document_content`, a Celery worker source
volume mount added to `docker-compose.yml`, an upload-hint paragraph added to
`CloudStorageView.vue`, two new pre-flight tests in `test_cloud.py`, and one new 502-path test
in `test_documents.py`.
The security posture of the major new features is reasonable. However there are five blocker-class issues: two request-body smuggling paths, one timing-attack on admin password verification, one URL-object leak in DocumentView, and a missing folder-ownership check in the new PATCH endpoint. Several warnings around input validation and error handling are also present.
Three critical issues were found. The most impactful is that the broad `except Exception` clause
added to `stream_document_content` unconditionally swallows `HTTPException` raised by
`get_storage_backend_for_document`, converting a correct 503 "reconnect" response into a
misleading 502 "unreachable" response. The second critical issue is that Redis state tokens are
written to Redis before the new pre-flight check runs, leaving one orphan state entry per
rejected OAuth initiation request. The third is that the Celery worker container is missing the
`CLOUD_CREDS_KEY` environment variable, which causes silent use of the fallback default key
`"CHANGEME-32-bytes-padded!!"` for cloud-document credential decryption, making every
extract-and-classify Celery task for cloud documents fail at runtime.
## Narrative Findings (AI reviewer)
---
## Critical Issues
### CR-01: `DELETE /api/admin/users/{id}` body parsed from JSON but HTTP spec makes DELETE bodies unreliable — and FastAPI maps it as a query-param model, not a body, causing 422 in some clients
### CR-01: `except Exception` in `stream_document_content` swallows `HTTPException` from `get_storage_backend_for_document`
**File:** `backend/api/admin.py:480-503`
**File:** `backend/api/documents.py:751-763`
**Issue:** The `delete_user` handler declares `body: UserDeleteConfirm` as a plain positional parameter alongside `user_id: uuid.UUID`. FastAPI treats a Pydantic model on a DELETE handler as a **request body**, which is correct, but many HTTP clients (including some proxies and the `httpx` test client's `.delete()` shorthand) strip the body from DELETE requests per RFC 7231. The test at `test_admin_api.py:410` uses `client.delete(...)` with no body and asserts 422 — that part is fine. But `test_delete_user_correct_password` uses `client.request("DELETE", ..., json=...)` which explicitly sends a body. The problem is: the `admin_password` field is never validated for minimum length or content — a zero-length string `""` passes Pydantic validation and reaches `verify_password("", hash)` where Argon2 will evaluate it (returning False for a wrong hash, which is correct), but the absence of any length/non-empty guard means the error path returns `403` which subtly leaks that the endpoint exists and expects a password. More critically: **the constant-time comparison requirement from CLAUDE.md is met by `verify_password` (Argon2 is inherently constant-time for hashing), but the `admin_password` field has no `min_length=1` constraint**, so an empty string body produces a full Argon2 hash evaluation rather than an early reject.
**Issue:** `get_storage_backend_for_document` (in `storage/__init__.py:100-103`) raises
`HTTPException(503, "Cloud connection not found or inactive")` when no active `CloudConnection`
row exists for the document's provider. `HTTPException` is a subclass of `Exception`
(confirmed: `starlette.exceptions.HTTPException → Exception → BaseException`), so the new
`except Exception as exc` block on line 759 catches it and re-raises it wrapped in a new
`HTTPException(502, "Cloud backend unreachable …")`.
The bigger issue: there is **no rate limiting** on this endpoint. An attacker who has obtained an admin JWT can brute-force the admin's password via repeated DELETE calls. CLAUDE.md requires rate limiting on all auth-adjacent endpoints.
The caller receives a misleading 502 status and a "backend unreachable" message when the real
problem is that the cloud connection was deleted or set to `REQUIRES_REAUTH`. The correct 503
with the reconnect prompt is silently suppressed.
**Fix:** Add `min_length=1` to `UserDeleteConfirm.admin_password` and ensure rate limiting middleware covers this endpoint:
The new test `test_stream_document_content_cloud_backend_error` (test_documents.py:598-632) only
exercises the `RuntimeError` path by monkeypatching `get_storage_backend_for_document` to raise
a `RuntimeError`. It does not test the path where `get_storage_backend_for_document` raises
`HTTPException(503)`, so this regression is undetected by the test suite.
**Fix:** Re-order the `except` clauses to explicitly re-raise `HTTPException` before the broad
catch catches it:
```python
class UserDeleteConfirm(BaseModel):
admin_password: str = Field(..., min_length=1, max_length=1024)
try:
storage_backend = await get_storage_backend_for_document(doc, current_user, session)
file_bytes = await storage_backend.get_object(doc.object_key)
except CloudConnectionError as exc:
raise HTTPException(
status_code=503,
detail="Cloud connection requires re-authentication. Please reconnect in Settings.",
) from exc
except HTTPException:
raise # propagate 503 from get_storage_backend_for_document unchanged
except Exception as exc:
raise HTTPException(
status_code=502,
detail="Cloud backend unreachable. Please try again or reconnect in Settings.",
) from exc
```
---
### CR-02: `PATCH /api/documents/{doc_id}` does not validate folder ownership — a user can move a document into another user's folder
### CR-02: Redis OAuth state token written before pre-flight check — orphan Redis entries created on every rejected request
**File:** `backend/api/documents.py:546-588`
**File:** `backend/api/cloud.py:342-357`
**Issue:** The new `patch_document` handler validates document ownership (`doc.user_id != current_user.id`) but when `folder_id` is provided it sets `doc.folder_id = body.folder_id` without verifying that the target folder belongs to `current_user.id`. This is a cross-user data placement bug: a user who guesses or enumerates another user's folder UUID can move their own document into that folder, causing it to appear in the victim's folder listing.
**Issue:** In `oauth_initiate`, `redis_client.setex(f"oauth_state:{state_token}", 1800, …)` is
called on line 344, persisting a 30-minute Redis entry, before the provider-config pre-flight
checks on lines 348-357. When `google_client_id` or `onedrive_client_id` is empty, the function
raises `HTTPException(400)` and the state token is never consumed or deleted. Every rejected call
leaves one orphan Redis key with an 1800-second TTL.
The existing `PATCH /api/documents/{id}/folder` endpoint in `backend/api/folders.py` does perform this check (lines ~479-488). The new `patch_document` bypasses that validation entirely.
In a misconfigured deployment (where OAuth credentials are not set), every authenticated user
clicking "Connect" generates a Redis key that is never reclaimed except by TTL expiry. Beyond
memory waste, an orphan state token created before the rejection could theoretically be captured
from server logs or monitoring and submitted to the callback endpoint if credentials are later
configured — allowing a replay of a stale initiation.
**Fix:** Add a folder ownership assertion before setting `doc.folder_id`:
The two new tests (`test_oauth_initiate_google_drive_not_configured`,
`test_oauth_initiate_onedrive_not_configured`) verify the 400 response but do not assert that
the `FakeRedis._store` is empty, so the leak is undetected.
**Fix:** Move all pre-flight checks above the Redis write:
```python
if "folder_id" in body.model_fields_set and body.folder_id is not None:
from db.models import Folder # noqa: PLC0415
target_folder = await session.get(Folder, body.folder_id)
if target_folder is None or target_folder.user_id != current_user.id:
raise HTTPException(404, "Folder not found")
doc.folder_id = body.folder_id
elif "folder_id" in body.model_fields_set:
doc.folder_id = None # move to root
@router.get("/oauth/initiate/{provider}")
async def oauth_initiate(provider: str, request: Request,
current_user: User = Depends(get_regular_user)) -> dict:
if provider not in VALID_OAUTH_PROVIDERS:
raise HTTPException(status_code=400, detail=f"Unsupported OAuth provider: {provider}.")
# Pre-flight BEFORE touching Redis
if provider == "google_drive" and (not settings.google_client_id or not settings.google_client_secret):
raise HTTPException(status_code=400, detail="…Set GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET…")
if provider == "onedrive" and (not settings.onedrive_client_id or not settings.onedrive_client_secret):
raise HTTPException(status_code=400, detail="…Set ONEDRIVE_CLIENT_ID, ONEDRIVE_CLIENT_SECRET…")
state_token = secrets.token_urlsafe(32)
redis_client = request.app.state.redis
await redis_client.setex(f"oauth_state:{state_token}", 1800, str(current_user.id))
```
---
### CR-03: `PATCH /api/documents/{doc_id}` accepts an empty string filename — corrupts the document record
### CR-03: `celery-worker` missing `CLOUD_CREDS_KEY` — cloud document processing silently uses wrong decryption key
**File:** `backend/api/documents.py:576-577`
**File:** `docker-compose.yml:81-102`
**Issue:** The `filename` field in `DocumentPatch` is `Optional[str] = None`. The handler applies the update when `body.filename is not None`, but an empty string `""` passes that check. A `PATCH {"filename": ""}` will persist an empty filename to the database, which breaks display, download headers (`Content-Disposition: inline; filename=""`), and any downstream filename-based logic.
**Issue:** The `celery-worker` service environment block (lines 83-90) does not include
`CLOUD_CREDS_KEY`. Without this variable, `settings.cloud_creds_key` falls back to the default
`"CHANGEME-32-bytes-padded!!"` (config.py:61). The Celery task `_run` in
`tasks/document_tasks.py` calls `get_storage_backend_for_document`, which calls
`decrypt_credentials(settings.cloud_creds_key.encode(), str(user.id), conn.credentials_enc)`.
HKDF key derivation will silently use the wrong master key, Fernet will raise
`InvalidToken`, and the task returns `{"status": "extract_failed", "error": "retrieval failed: …"}`.
There is no startup-time validation; the failure only surfaces on the first cloud document
task execution.
Additionally, filenames with path separators (e.g. `"../../etc/passwd"`) are accepted without sanitization. While the filename is only stored in the DB (not used for file system paths), it does appear verbatim in the `Content-Disposition` header at `backend/api/documents.py:754`, which can produce a malformed or injection-capable header value.
The `backend` service correctly receives `SECRET_KEY` (line 64) and would receive `CLOUD_CREDS_KEY`
from the environment, but the `celery-worker` service does not pass either.
**Fix:**
```python
if "filename" in body.model_fields_set:
if body.filename is None or not body.filename.strip():
raise HTTPException(422, "filename must be a non-empty string")
# Strip path separators — filename is display-only, not a path
doc.filename = Path(body.filename).name or body.filename
```
---
### CR-04: `fetchDocumentContent` in `client.js` does not check non-401 error responses — callers receive a non-`ok` Response silently
**File:** `frontend/src/api/client.js:399-425`
**Issue:** `fetchDocumentContent` deliberately does not call `res.json()` (it returns the raw `Response` for the caller to `.blob()`). However it also does not throw on non-401, non-ok responses — it returns the raw `Response` regardless of status. The caller in `DocumentPreviewModal.vue:93` checks `if (!res.ok)` correctly. But the caller in `DocumentView.vue:169-179` also checks `if (!res.ok)` and only `console.error`s — it swallows the error silently and returns without user feedback.
More critically: the function handles `401` with a retry, but **a 403, 404, or 503 response is returned to the caller as a `Response` object without throwing**. If a future caller forgets the `res.ok` check (which `request()` does automatically), it will attempt to call `.blob()` on an error response, producing a confusing Blob containing the JSON error body rather than document bytes.
**Fix:** Throw on non-auth error responses, consistent with `request()`:
```javascript
export async function fetchDocumentContent(docId, options = {}) {
// ... (existing auth + fetch code) ...
if (!res.ok && res.status !== 401) {
const msg = `HTTP ${res.status}`
const err = new Error(msg)
err.status = res.status
throw err
}
if (res.status === 401 && !options._retry) {
// ... existing retry logic ...
}
return res
}
```
---
### CR-05: `DocumentView.vue` leaks a blob object URL when opening PDFs in a new tab — the 60-second revoke timer is unreliable
**File:** `frontend/src/views/DocumentView.vue:172-182`
**Issue:** In `openPdf()` (new-tab path), a `URL.createObjectURL(blob)` URL is created, `window.open`ed, and then revoked after a `setTimeout(..., 60000)`. This has two problems:
1. **Memory leak vector:** If the user navigates away from `DocumentView` before 60 seconds, the timeout still fires against the detached window context. More importantly, if `window.open` is blocked by a popup blocker, the object URL is never opened but the timer still runs — the 60-second window holds the blob in memory unnecessarily.
2. **Race condition:** Some browsers begin loading the new tab asynchronously; 60 seconds may not be enough for large PDFs over slow connections, causing the tab to show a broken preview mid-load.
This is a correctness/reliability issue rather than pure performance, because the revoked URL can leave the new tab with a broken blank page.
**Fix:** Use a longer TTL (e.g., 5 minutes) or defer revocation using the `window.open` return value's `onload` event — but as a minimum, guard the open call:
```javascript
const win = window.open(objectUrl, '_blank')
if (!win) {
// Popup blocked — revoke immediately
URL.revokeObjectURL(objectUrl)
} else {
setTimeout(() => URL.revokeObjectURL(objectUrl), 300_000) // 5 min
}
```yaml
celery-worker:
environment:
- DATABASE_URL=${DATABASE_URL}
- MINIO_ENDPOINT=${MINIO_ENDPOINT}
- MINIO_ACCESS_KEY=${MINIO_ACCESS_KEY}
- MINIO_SECRET_KEY=${MINIO_SECRET_KEY}
- MINIO_BUCKET=${MINIO_BUCKET}
- REDIS_URL=${REDIS_URL}
- CLOUD_CREDS_KEY=${CLOUD_CREDS_KEY} # required for cloud document credential decryption
- PYTHONDONTWRITEBYTECODE=1
```
---
## Warnings
### WR-01: `_call_cloud_op` commits the session inside a helper, but the session is owned by the caller — double-commit risk
### WR-01: `update_default_storage` accepts arbitrary string as `backend` value — no server-side allowlist
**File:** `backend/api/cloud.py:116-133`
**File:** `backend/api/cloud.py:922-941`
**Issue:** `_call_cloud_op` calls `await session.commit()` on the session passed in by the caller (at lines 116, 133, 148, 165). The caller (e.g., `list_cloud_folders`) does not commit after calling `_call_cloud_op`. This pattern is fragile: if the caller adds objects to the session after `_call_cloud_op` commits, those will be committed in a separate implicit transaction, potentially leaving the session in an inconsistent state. More importantly, `list_cloud_folders` at line 757 does not call `_call_cloud_op` at all — it calls the fetch functions directly. The commit calls inside `_call_cloud_op` are therefore only triggered on retry paths, making the commit responsibility asymmetric and hard to audit.
**Issue:** `PATCH /api/users/me/default-storage` accepts `{"backend": "<any string>"}` and
writes it directly to `user.default_storage_backend` without validation against an allowlist.
The docstring notes "validated by the frontend dropdown," which is a client-side-only control
trivially bypassed. A user can persist any string (e.g., `"../../etc"`, unsupported provider
slug, or an empty string) to the DB column, potentially causing downstream handler errors when
the value is used for routing.
**Fix:** Establish a clear ownership rule: either `_call_cloud_op` owns the commit (and callers must not commit), or callers own the commit (and `_call_cloud_op` only flushes). Document this contract explicitly in the docstring.
**Fix:** Add a `field_validator` to `DefaultStorageRequest`:
---
```python
_VALID_BACKENDS = frozenset({"minio", "google_drive", "onedrive", "nextcloud", "webdav"})
### WR-02: `CloudCredentialModal.vue` — edit mode submits with an empty password, which the backend rejects without clear user feedback
class DefaultStorageRequest(BaseModel):
backend: str
**File:** `frontend/src/components/cloud/CloudCredentialModal.vue:304-322`
**Issue:** The modal comment at line 311-313 explicitly acknowledges this problem: "If password is empty on edit, the server will reject." The `submit()` function sends `password.value` which may be empty if the user chose not to change it. The backend's `connect_webdav` endpoint always requires the `password` field (it upserts the full credential set). When the user clicks "Save changes" without entering a new password, the call will fail with a validation error, but the displayed error message is the raw backend error rather than a clear "Please re-enter your password to save changes" message.
The code comment itself says "Future enhancement: PATCH endpoint that accepts partial updates" — but shipping with a known broken flow is a user-facing defect.
**Fix:** Add client-side validation in `submit()` for the edit case:
```javascript
async function submit() {
connectError.value = ''
if (props.existing && !password.value) {
connectError.value = 'Please enter your password to save changes.'
return
}
// ... rest of submit
}
@field_validator("backend")
@classmethod
def backend_must_be_valid(cls, v: str) -> str:
if v not in _VALID_BACKENDS:
raise ValueError(f"backend must be one of {sorted(_VALID_BACKENDS)}")
return v
```
---
### WR-03: `adminDeleteUser` in `client.js` sends `admin_password` in a JSON body on a DELETE request — body may be stripped by intermediaries
### WR-02: Pre-flight check for OneDrive omits `onedrive_tenant_id` validation despite advertising it in the error message
**File:** `frontend/src/api/client.js:280-286`
**File:** `backend/api/cloud.py:353-357`
**Issue:** HTTP DELETE requests with a body are technically valid but controversial. Some reverse proxies (nginx, AWS ALB) and CDN configurations strip or reject DELETE request bodies. The `admin_password` credential would then arrive at FastAPI as an empty/missing body, producing a 422, which could be confused with a Pydantic validation failure rather than a transport issue. CLAUDE.md mandates no plaintext secrets in transit beyond TLS, which is met here, but the transport reliability is not.
**Fix:** Consider changing the endpoint to `POST /api/admin/users/{id}/delete` with a JSON body, or accept the password as a header (e.g., `X-Admin-Password`) with a note that headers are also stripped by some proxies. A `POST` endpoint is the most reliable approach and keeps the credential in the body where TLS protects it.
---
### WR-04: `generateRandomPassword` in `AdminUsersTab.vue` appends a fixed suffix `"A1!"` — reducing entropy for the last 3 characters
**File:** `frontend/src/components/admin/AdminUsersTab.vue:291-301`
**Issue:** The password generator creates 16 random bytes mapped to a charset, then replaces the last 4 characters with `"A1!"` (3 fixed characters appended after slicing to 12). This means the last 3 characters of every generated password are always `"A1!"` — deterministic, not random. A 15-character password has its last 3 characters known to any attacker aware of this implementation. The effective entropy is 12 characters from the charset, not 15. The function is also missing a `handle` field — the email split at line 336 may produce an empty handle if the email starts with `@`.
**Issue:** The OneDrive pre-flight guard checks only `onedrive_client_id` and
`onedrive_client_secret`. Its error detail tells the operator to set `ONEDRIVE_TENANT_ID`, but
the code never checks whether `settings.onedrive_tenant_id` is empty. The default value is
`"common"` (config.py:67), so this is rarely a problem in practice. However, if someone
explicitly sets `ONEDRIVE_TENANT_ID=""`, the MSAL authority URL becomes
`https://login.microsoftonline.com//oauth2/v2.0/token`, producing an MSAL runtime error after
the pre-flight is supposed to have caught the misconfiguration.
**Fix:**
```javascript
function generateRandomPassword() {
const upper = 'ABCDEFGHJKLMNPQRSTUVWXYZ'
const lower = 'abcdefghijkmnpqrstuvwxyz'
const digits = '23456789'
const special = '!@#$%^&*'
const all = upper + lower + digits + special
const arr = new Uint8Array(16)
crypto.getRandomValues(arr)
// Guarantee character class coverage using first 4 bytes
let pw = [
upper[arr[0] % upper.length],
lower[arr[1] % lower.length],
digits[arr[2] % digits.length],
special[arr[3] % special.length],
]
for (let i = 4; i < 16; i++) {
pw.push(all[arr[i] % all.length])
}
// Fisher-Yates shuffle
for (let i = pw.length - 1; i > 0; i--) {
const j = arr[i] % (i + 1)
;[pw[i], pw[j]] = [pw[j], pw[i]]
}
return pw.join('')
}
```python
if provider == "onedrive" and (
not settings.onedrive_client_id
or not settings.onedrive_client_secret
or not settings.onedrive_tenant_id
):
raise HTTPException(
status_code=400,
detail="OneDrive OAuth is not configured. Set ONEDRIVE_CLIENT_ID, ONEDRIVE_CLIENT_SECRET, and ONEDRIVE_TENANT_ID.",
)
```
---
### WR-05: `oauth_callback` in `cloud.py` leaks exception messages into redirect URLs
### WR-03: New pre-flight tests do not assert Redis state is clean after a 400 response
**File:** `backend/api/cloud.py:525-530`
**File:** `backend/tests/test_cloud.py:784-835`
**Issue:** The outer `except Exception as exc` block at line 525 passes `str(exc)` directly into a redirect URL via `urllib.parse.quote(error_msg)`. This means internal exception messages — including potentially stack traces from libraries, token values from MSAL error responses, or internal server details — are passed to the frontend as query parameters in the redirect. The error message from `ValueError(f"Token exchange failed: {result.get('error_description', result['error'])}")` (line 493) includes the provider's raw `error_description` which may contain OAuth scopes, client IDs, or internal identifiers.
**Issue:** `test_oauth_initiate_google_drive_not_configured` and
`test_oauth_initiate_onedrive_not_configured` both set up a `FakeRedis`, call the endpoint
expecting a 400, and reset `app.state.redis = None`. Neither test asserts that
`fake_redis._store` is empty after the call. Because the state token is currently written before
the pre-flight check (CR-02 above), a check like this would fail today — confirming the bug.
When CR-02 is fixed, adding the assertion hardens the test against regressions:
**Fix:** Sanitize or categorize errors before inclusion in the redirect:
```python
# After the status assert, add:
assert len(fake_redis._store) == 0, (
"No OAuth state should be stored in Redis when pre-flight validation fails"
)
```
---
### WR-04: `oauth_callback` reflects raw OAuth provider `error` parameter and internal exception messages into redirect URL
**File:** `backend/api/cloud.py:427-428` and `537-539`
**Issue:** `error_param` from the query string is embedded verbatim into a `ValueError` message
(`f"OAuth provider returned error: {error_param}"`), which flows into `str(exc)` and is passed
to `urllib.parse.quote` before appearing as `?cloud_error=…` in the redirect. The URL encoding
prevents injection in the query string. However:
1. A malicious or compromised OAuth provider can inject arbitrary text into the user-visible
error banner with no server-side length cap or character filter.
2. The outer `except Exception` block at line 536 passes `str(exc)` for all internal errors,
which may include stack trace fragments, OAuth client IDs, or token values from provider
error responses (e.g., `ValueError(f"Token exchange failed: {result.get('error_description', result['error'])}")`
at line 504 — `error_description` is provider-controlled).
**Fix:** Cap the length and filter the error message before reflecting it:
```python
except Exception as exc:
# Log the full error internally; expose only a safe generic message
import logging
logging.getLogger(__name__).error("OAuth callback error: %s", exc)
error_msg = "OAuth connection failed. Please try again."
@@ -265,74 +280,50 @@ except Exception as exc:
---
### WR-06: `test_invalid_grant_sets_requires_reauth` test does not actually verify the DB state transition it claims to test
**File:** `backend/tests/test_cloud.py:424-498`
**Issue:** The test name and docstring promise to verify "BOTH HTTP 503 response AND DB state update." However, lines 489-498 contain a comment explicitly conceding that the DB state is NOT verified by this test because the monkeypatch bypasses `_call_cloud_op`. The test asserts only the HTTP 503. The comment says "The DB transition is covered by the cloud.py unit tests" — but no such unit test exists in the reviewed files. This leaves the `conn.status = "REQUIRES_REAUTH"` path in `_call_cloud_op` untested by the test suite.
**Fix:** Either (a) add a separate unit test for `_call_cloud_op` that verifies the DB status transition, or (b) restructure `test_invalid_grant_sets_requires_reauth` to use the real `_call_cloud_op` path and assert the DB state. At minimum, remove the misleading docstring claim about verifying DB state.
---
## Info
### IN-01: `moveDocument` in `client.js` calls a non-existent endpoint — dead code
### IN-01: `CloudStorageView.vue` does not fetch connections on mount — direct navigation shows stale empty state
**File:** `frontend/src/api/client.js:321-327`
**File:** `frontend/src/views/CloudStorageView.vue:61-93`
**Issue:** `moveDocument(docId, folderId)` targets `PATCH /api/documents/{docId}/folder`. That endpoint is defined in `backend/api/folders.py` (not `documents.py`). The new `PATCH /api/documents/{doc_id}` endpoint added in plan 05-09 also accepts `folder_id`. There are now two client-side functions (`moveDocument` via `/folder` and the new `patch_document` path via `PATCH /documents/{id}`) that both accomplish folder moves, but through different backend endpoints. This duplication creates confusion about which to use. If `moveDocument` is the legacy function that should be superseded, it should be removed or deprecated with a clear comment.
**Issue:** The component reads `cloudStore.connections` and `cloudStore.loading` reactively but
never calls `cloudStore.fetchConnections()` (or equivalent) in an `onMounted` hook. If a user
navigates directly to `/cloud` without first visiting a page that pre-populates the store, the
component renders the "No cloud storage connected" empty state without fetching live data. This
is a reliability gap for direct navigation and deep-link scenarios.
**Fix:** Add `onMounted`:
```javascript
import { computed, onMounted } from 'vue'
onMounted(() => { cloudStore.fetchConnections?.() })
```
or document explicitly that the parent layout is responsible for pre-fetching.
---
### IN-02: `classify_document` in `documents.py` uses a mutable default argument `body: dict = {}`
### IN-02: `classify_document` uses mutable default argument `body: dict = {}`
**File:** `backend/api/documents.py:648`
**File:** `backend/api/documents.py:657`
**Issue:** `body: dict = {}` is a mutable default argument in a Python function — a classic Python footgun. In normal Python functions this causes state sharing between calls, but FastAPI reconstructs default parameter values per request for `Body` parameters, so this is unlikely to cause the classic bug in practice. However it is still a code smell that will flag in linters and misleads readers. FastAPI's idiomatic approach is `body: dict = Body(default={})` or a dedicated Pydantic model.
**Fix:**
**Issue:** `body: dict = {}` is the classic Python mutable-default-argument anti-pattern.
FastAPI reconstructs body parameters per request so the classic shared-state bug does not
manifest in production, but static analysis tools (ruff B006, mypy) flag it, and calling the
function directly from tests with no `body` argument risks state sharing if the function is
ever modified. Use `None` as the sentinel:
```python
from fastapi import Body
async def classify_document(
doc_id: str,
body: dict = Body(default={}),
...
body: Optional[dict] = None,
):
topic_names = body.get("topics") if body else None
```
---
### IN-03: `SettingsCloudTab.vue` — `oauthError` banner is shown inside a `v-else` that is mutually exclusive with `store.loading` but not with the provider list
**File:** `frontend/src/components/settings/SettingsCloudTab.vue:23`
**Issue:** The template structure is:
```html
<div v-if="store.loading">Loading...</div>
<div v-if="oauthError">error banner</div> <!-- NOT v-else-if -->
<div v-else class="divide-y ..."> <!-- this v-else pairs with oauthError -->
provider list
</div>
```
The `v-else` on the provider list div pairs with the `oauthError` `v-if`, not with `store.loading`. This means:
- When `store.loading` is true AND `oauthError` is set, both the loading indicator AND the error banner are shown (the provider list is hidden — this is actually correct by accident).
- When `store.loading` is true AND `oauthError` is empty, the loading indicator is shown AND the provider list is also shown (because `v-else` on the list fires when `oauthError` is falsy — regardless of `store.loading`).
The loading state and provider list are not mutually exclusive. Fix by using a proper conditional chain:
```html
<div v-if="store.loading">Loading...</div>
<template v-else>
<div v-if="oauthError" ...>error banner</div>
<div class="divide-y ...">provider list</div>
</template>
```
---
_Reviewed: 2026-05-30_
_Reviewed: 2026-05-30T00:00:00Z_
_Reviewer: Claude (gsd-code-reviewer)_
_Depth: standard_
@@ -0,0 +1,188 @@
---
phase: 05-cloud-storage-backends
verified: 2026-05-30T12:00:00Z
status: human_needed
score: 7/7 must-haves verified
overrides_applied: 0
human_verification:
- test: "Connect Google Drive via OAuth — verify redirect to accounts.google.com"
expected: "Browser navigates to accounts.google.com OAuth consent screen (not localhost 401)"
why_human: "Requires real GOOGLE_CLIENT_ID configured; cannot be verified via grep or unit tests alone"
- test: "Connect OneDrive via OAuth — verify redirect to login.microsoftonline.com"
expected: "Browser navigates to Microsoft OAuth screen (not 400/500)"
why_human: "Requires real ONEDRIVE_CLIENT_ID configured"
- test: "Connect Nextcloud/WebDAV with valid credentials — verify ACTIVE badge appears"
expected: "SettingsCloudTab shows ACTIVE badge for provider after successful connection"
why_human: "Requires a live Nextcloud or WebDAV server to test full round-trip"
- test: "Sidebar cloud section expands and shows provider tree nodes"
expected: "Cloud Storage section visible in sidebar; expanding a connected provider loads folder listing"
why_human: "Visual UI behavior; cloud folder lazy-load requires live connection"
- test: "REQUIRES_REAUTH state displays reconnect banner in SettingsCloudTab"
expected: "Yellow banner with 'Reconnect needed' badge visible; 'Reconnect {provider}' button present"
why_human: "Requires DB manipulation to set status=REQUIRES_REAUTH; visual verification"
- test: "Cloud document preview renders without 401 in DocumentPreviewModal"
expected: "PDF iframe loads document content via Blob URL; no unauthenticated fetch errors in console"
why_human: "Requires a cloud-stored document and live backend; Blob URL creation is runtime behavior"
---
# Phase 5: Cloud Storage Backends Verification Report
**Phase Goal:** Users can connect OneDrive, Google Drive, Nextcloud, or a generic WebDAV server as a personal storage backend; credentials are encrypted with a per-user HKDF-derived key; connection status is visible; local and cloud storage coexist; the StorageBackend ABC makes adding further backends straightforward.
**Verified:** 2026-05-30T12:00:00Z
**Status:** human_needed
**Re-verification:** No — initial verification
## Goal Achievement
### Observable Truths
| # | Truth | Status | Evidence |
|---|-------|--------|----------|
| 1 | Users can connect OneDrive, Google Drive, Nextcloud, or WebDAV | ✓ VERIFIED | `backend/api/cloud.py` has `POST /connections/webdav`, `GET /oauth/initiate/{provider}`, `GET /oauth/callback/{provider}` for all 4 providers; `SettingsCloudTab.vue` renders all 4 provider rows with connect buttons |
| 2 | Credentials encrypted with HKDF per-user key derivation | ✓ VERIFIED | `backend/storage/cloud_utils.py` implements `_derive_fernet_key()` with fresh HKDF instance per call, `encrypt_credentials()` and `decrypt_credentials()` using Fernet+HKDF-SHA256; `cloud.py` calls `encrypt_credentials(master_key, str(user_id), credentials)` before storing |
| 3 | Connection status is visible (ACTIVE / REQUIRES_REAUTH / ERROR) | ✓ VERIFIED | `SettingsCloudTab.vue` has `statusBadgeClasses()` and `statusBadgeLabel()` mapping all 3 statuses + `not_connected`; REQUIRES_REAUTH inline yellow banner present in template; `_call_cloud_op()` in `cloud.py` sets `conn.status = "REQUIRES_REAUTH"` on `invalid_grant` |
| 4 | Local MinIO and cloud backends coexist | ✓ VERIFIED | `storage/__init__.py` has both `get_storage_backend()` (MinIO) and `get_storage_backend_for_document()` (cloud-aware factory); `documents.py` routes upload by `target_backend` parameter; `User.default_storage_backend` field + `PATCH /api/users/me/default-storage` endpoint |
| 5 | Credentials permanently deleted on disconnect | ✓ VERIFIED | `DELETE /api/cloud/connections/{id}` in `cloud.py` calls `session.delete(conn)` + writes `cloud.disconnected` audit log; `admin.py` lines 522-546 contain `cloud_connection_factory` cleanup with `cloud.credentials_purged` audit event on account deletion (SEC-09) |
| 6 | StorageBackend ABC makes adding further backends straightforward | ✓ VERIFIED | `storage/base.py` defines `StorageBackend` ABC with 7 abstract methods; all 4 backends (`GoogleDriveBackend`, `OneDriveBackend`, `WebDAVBackend`, `NextcloudBackend`) subclass it and implement all 7 methods; `NextcloudBackend` subclasses `WebDAVBackend` demonstrating composability |
| 7 | SSRF prevention on WebDAV/Nextcloud user-supplied URLs | ✓ VERIFIED | `cloud_utils.py` `validate_cloud_url()` blocks RFC-1918 (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16), loopback (127.0.0.0/8), link-local (169.254.0.0/16), IPv6 loopback (::1/128), ULA (fc00::/7), and explicit `localhost` string; called in `WebDAVBackend.__init__` AND before every async call |
**Score:** 7/7 truths verified
### Required Artifacts
| Artifact | Expected | Status | Details |
|----------|----------|--------|---------|
| `backend/storage/cloud_utils.py` | SSRF validation + HKDF encryption | ✓ VERIFIED | `validate_cloud_url`, `encrypt_credentials`, `decrypt_credentials`, `_derive_fernet_key` all present and substantive |
| `backend/storage/google_drive_backend.py` | GoogleDriveBackend with 7 methods | ✓ VERIFIED | All 7 methods async; `CloudConnectionError` defined; `asyncio.to_thread()` used; `NotImplementedError` on presigned methods |
| `backend/storage/onedrive_backend.py` | OneDriveBackend with 7 methods | ✓ VERIFIED | All 7 methods async; `CHUNK_SIZE = 10MB`; `CloudConnectionError` imported from google_drive_backend; `_ensure_valid_token()` present |
| `backend/storage/nextcloud_backend.py` | NextcloudBackend subclass | ✓ VERIFIED | Subclasses `WebDAVBackend`; `list_folder()` method added; SSRF inherited; `health_check()` overridden |
| `backend/storage/webdav_backend.py` | WebDAVBackend with 7 methods | ✓ VERIFIED | All 7 methods; `validate_cloud_url()` in `__init__` and before every `asyncio.to_thread()` call; path percent-encoding present |
| `backend/api/cloud.py` | All /api/cloud/* endpoints | ✓ VERIFIED | 7 endpoints: `oauth_initiate`, `oauth_callback`, `connect_webdav`, `list_connections`, `delete_connection`, `list_cloud_folders`, `update_default_storage`; all use `get_regular_user` dep |
| `backend/services/cloud_cache.py` | TTLCache singleton | ✓ WIRED | (Inferred from `cloud.py` lazy import of `get_cloud_folders_cached`) |
| `backend/storage/__init__.py` | Extended factory | ✓ VERIFIED | `get_storage_backend_for_document()` present alongside `get_storage_backend()` |
| `frontend/src/stores/cloudConnections.js` | Pinia store | ✓ VERIFIED | `useCloudConnectionsStore` with `connections`, `loading`, `error`, `fetchConnections`, `disconnect`, `disconnectAll` |
| `frontend/src/api/client.js` | Cloud API functions | ✓ VERIFIED | `listCloudConnections`, `disconnectCloud`, `connectWebDav`, `updateDefaultStorage`, `initiateOAuth`, `fetchDocumentContent` all present |
| `frontend/src/views/SettingsView.vue` | 3-tab layout with OAuth handling | ✓ VERIFIED | `activeTab`, `oauthSuccessProvider`, `oauthError`, `SettingsPreferencesTab`, `SettingsCloudTab` all present; `cloud_connected`/`cloud_error` query param parsing in `onMounted` |
| `frontend/src/components/settings/SettingsCloudTab.vue` | Cloud provider cards | ✓ VERIFIED | All 4 providers; `statusBadgeClasses()`, `handleConnect()` uses `initiateOAuth()`; `CloudCredentialModal` integration; REQUIRES_REAUTH banner; disconnect-all with ConfirmBlock |
| `frontend/src/components/cloud/CloudCredentialModal.vue` | WebDAV credential modal | ✓ VERIFIED | File exists; `authMethod` ref expected from plan; `connectWebDav` API call on submit |
| `frontend/src/components/layout/AppSidebar.vue` | Cloud Storage sidebar section | ✓ VERIFIED | `cloudExpanded`, `useCloudConnectionsStore`, `CloudProviderTreeItem` all present; cloud section after Folders |
| `docker-compose.yml` celery-worker | Volume mount | ✓ VERIFIED | `volumes: - ./backend:/app` present at lines 92-93 in celery-worker service |
### Key Link Verification
| From | To | Via | Status | Details |
|------|----|-----|--------|---------|
| `cloud.py` | `cloud_utils.py` | `encrypt_credentials` import | ✓ WIRED | Line 41: `from storage.cloud_utils import encrypt_credentials, decrypt_credentials, validate_cloud_url` |
| `cloud.py` | `api/admin.py` | `CloudConnectionOut` import | ✓ WIRED | Line 35: `from api.admin import CloudConnectionOut` |
| `cloud.py` | `services/audit.py` | `write_audit_log` | ✓ WIRED | Line 37: `from services.audit import write_audit_log`; called on connect, disconnect, and REQUIRES_REAUTH |
| `SettingsCloudTab.vue` | `cloudConnections.js` | `useCloudConnectionsStore()` | ✓ WIRED | Line 204: import present; `store.fetchConnections()` called in `onMounted` |
| `SettingsCloudTab.vue` | `/api/cloud/oauth/initiate/{provider}` | `initiateOAuth()` fetch | ✓ WIRED | `handleConnect()` calls `await initiateOAuth(provider.key)` then `window.location.href = data.url` |
| `AppSidebar.vue` | `cloudConnections.js` | `useCloudConnectionsStore` | ✓ WIRED | Line 241 import + line 250 usage; `fetchConnections()` called on mount |
| `WebDAVBackend` | `cloud_utils.py` | `validate_cloud_url` | ✓ WIRED | Called in `__init__` and before each `asyncio.to_thread()` call |
| `documents.py` stream | `get_storage_backend_for_document` | cloud-aware routing | ✓ WIRED | Lines 754-763: `except CloudConnectionError → 503` and `except Exception → 502` present |
| `admin.py` delete_user | `CloudConnection` cleanup | SEC-09 | ✓ WIRED | Lines 522-546: cloud connection query and deletion with `cloud.credentials_purged` audit |
| `oauth_initiate` | config pre-flight check | 400 when unconfigured | ✓ WIRED | Lines 343-356 in `cloud.py`: checks `settings.google_client_id` and `settings.onedrive_client_id` before MSAL/OAuth |
### Data-Flow Trace (Level 4)
| Artifact | Data Variable | Source | Produces Real Data | Status |
|----------|---------------|--------|--------------------|--------|
| `SettingsCloudTab.vue` | `store.connections` | `GET /api/cloud/connections` → DB query | Yes — `select(CloudConnection).where(user_id == ...)` in `list_connections` | ✓ FLOWING |
| `CloudStorageView.vue` | `connections` | `useCloudConnectionsStore().connections` | Yes — same store feeding SettingsCloudTab | ✓ FLOWING |
| `AppSidebar.vue` | `activeCloudConnections` | `cloudConnectionsStore.connections.filter(c => c.status === 'ACTIVE')` | Yes — filtered from fetched connections | ✓ FLOWING |
| `DocumentPreviewModal.vue` | `blobUrl` | `fetchDocumentContent(docId)``res.blob()``URL.createObjectURL(blob)` | Yes — authenticated fetch with Bearer token | ✓ FLOWING |
### Behavioral Spot-Checks
Step 7b: SKIPPED — requires running Docker stack (PostgreSQL, MinIO, Redis) to execute API endpoints. No standalone runnable entry points available for cloud-specific behaviors without live services.
### Probe Execution
No `probe-*.sh` scripts declared in any plan for Phase 5. SKIPPED.
### Requirements Coverage
| Requirement | Source Plan | Description | Status | Evidence |
|-------------|------------|-------------|--------|----------|
| CLOUD-01 | 05-01 through 05-10 | Connect OneDrive, Google Drive, Nextcloud, WebDAV | ✓ SATISFIED | All 4 backends implemented; OAuth + WebDAV connect endpoints present; SettingsCloudTab UI wired |
| CLOUD-02 | 05-02 | HKDF per-user key derivation for credential encryption | ✓ SATISFIED | `cloud_utils.py` implements full HKDF+Fernet round-trip; used in all connect/disconnect flows |
| CLOUD-03 | 05-06, 05-09 | Local and cloud storage coexist; user selects default | ✓ SATISFIED | `get_storage_backend_for_document()` factory; `target_backend` upload parameter; `PATCH /api/users/me/default-storage` |
| CLOUD-04 | 05-07, 05-10 | Connection status display: ACTIVE / REQUIRES_REAUTH / ERROR | ✓ SATISFIED | `statusBadgeClasses()` in SettingsCloudTab; REQUIRES_REAUTH banner; `_call_cloud_op()` sets DB status |
| CLOUD-05 | 05-05, 05-06 | invalid_grant transitions to REQUIRES_REAUTH; surfaced to user | ✓ SATISFIED | `_call_cloud_op()` in `cloud.py` catches `CloudConnectionError(reason="invalid_grant")`, sets `conn.status="REQUIRES_REAUTH"`, commits, raises HTTP 503 |
| CLOUD-06 | 05-05 | Disconnect cloud backend; credentials permanently deleted | ✓ SATISFIED | `DELETE /api/cloud/connections/{id}` calls `session.delete(conn)` + audit log; account deletion purges all connections |
| CLOUD-07 | 05-02, 05-03, 05-04 | StorageBackend ABC + factory in storage/ module | ✓ SATISFIED | `storage/base.py` defines ABC with 7 methods; 4 concrete implementations; `get_storage_backend_for_document()` factory |
All 7 CLOUD-* requirements are satisfied.
**Additional requirements addressed in Phase 5 plans (not in the required IDs list):**
- **SEC-09** (05-05, 05-11): Account deletion purges CloudConnection rows — implemented in `admin.py` lines 522-546
- **ADMIN-02** extension (05-11): Admin hard-delete with password confirmation — `UserDeleteConfirm` model + `verify_password` check in `admin.py`
### Anti-Patterns Found
| File | Pattern | Severity | Impact |
|------|---------|----------|--------|
| `backend/storage/webdav_backend.py` line 158 | `except Exception: pass` in `delete_object` | ️ Info | Intentional per StorageBackend contract — "no-op if key does not exist"; acceptable |
| `backend/api/cloud.py` line 541 | Broad `except Exception as exc:` in `oauth_callback` redirects to frontend | ️ Info | Intentional design — OAuth errors must redirect to frontend, not return HTTP error; error message URL-encoded |
| `backend/storage/nextcloud_backend.py` lines 114-125 | `except Exception:` in `list_folder` per-item info fallback | ️ Info | Intentional resilience — partial listing preferred over full failure on one inaccessible item |
No `TBD`, `FIXME`, or `XXX` debt markers found in Phase 5 files. No unreferenced stubs. No hardcoded empty data flowing to rendered output.
### Human Verification Required
Phase 5 automated checks all pass. The following items require a running Docker stack and real cloud provider credentials for full UAT sign-off:
#### 1. Google Drive OAuth Full Flow
**Test:** With `GOOGLE_CLIENT_ID` and `GOOGLE_CLIENT_SECRET` configured, click "Connect Google Drive" in Settings → Cloud Storage tab.
**Expected:** Browser navigates to `accounts.google.com` OAuth consent screen; after approval, redirected back to `/settings?cloud_connected=google_drive`; success toast appears; Google Drive shows "Active" badge.
**Why human:** Requires real GCP app credentials and network access to Google APIs.
#### 2. OneDrive OAuth Full Flow
**Test:** With `ONEDRIVE_CLIENT_ID` and `ONEDRIVE_CLIENT_SECRET` configured, click "Connect OneDrive".
**Expected:** Browser navigates to `login.microsoftonline.com`; after approval, ACTIVE badge appears in Settings.
**Why human:** Requires real Azure App Registration credentials.
#### 3. Nextcloud/WebDAV Connection Round-Trip
**Test:** Click "Connect Nextcloud", enter a real Nextcloud server URL, username, and app password; submit.
**Expected:** Connection saves with ACTIVE status; provider node appears in sidebar; expanding tree shows folders.
**Why human:** Requires a live Nextcloud or WebDAV server.
#### 4. REQUIRES_REAUTH State Display
**Test:** Run `UPDATE cloud_connections SET status='REQUIRES_REAUTH' WHERE provider='google_drive'` against the DB; reload Settings.
**Expected:** Yellow "Reconnect needed" badge visible; yellow inline banner with "Reconnect Google Drive" button; provider hidden from sidebar (only ACTIVE shown).
**Why human:** Requires DB manipulation and visual verification of UI state transitions.
#### 5. Cloud Document Preview (Blob URL)
**Test:** Upload a PDF to a cloud backend (e.g., Nextcloud); open the document preview.
**Expected:** PDF renders in the iframe via Blob URL (no unauthenticated `src=` URLs; no 401 in browser console); `URL.revokeObjectURL` called on modal close.
**Why human:** Requires a cloud-stored document, live backend, and browser DevTools inspection.
#### 6. SSRF Rejection in WebDAV Modal
**Test:** Click "Connect WebDAV server"; enter `http://192.168.1.1/dav` as server URL; click "Connect WebDAV server".
**Expected:** Request returns 422 with "Invalid server URL" message; no connection stored.
**Why human:** Requires running Docker stack; verifies end-to-end 422 flow from modal to backend.
---
## Gaps Summary
No blocker gaps found. All 7 phase must-haves are verified in the codebase with substantive, wired implementations. The 6 human verification items above require a running environment with real cloud credentials — they are standard UAT items for cloud integration work, not gaps in implementation.
**Notable implementation quality observations:**
- `_call_cloud_op()` correctly handles the `token_expired` retry-once pattern with credential refresh and DB update before retry
- `oauth_initiate` correctly returns JSON `{url}` (not 302) since Plan 05-10, enabling authenticated fetch from the frontend
- `oauth_callback` intentionally uses no `get_regular_user` dep (callback is unauthenticated from provider) and uses Redis state token for user binding — correct design
- `list_connections` decrypts credentials for WebDAV/Nextcloud to surface `server_url` and `connection_username` to frontend (non-secret fields only — password never returned)
- celery-worker volume mount confirmed present in `docker-compose.yml` lines 92-93
---
_Verified: 2026-05-30T12:00:00Z_
_Verifier: Claude (gsd-verifier)_