diff --git a/.planning/phases/05-cloud-storage-backends/05-REVIEW.md b/.planning/phases/05-cloud-storage-backends/05-REVIEW.md new file mode 100644 index 0000000..c193019 --- /dev/null +++ b/.planning/phases/05-cloud-storage-backends/05-REVIEW.md @@ -0,0 +1,338 @@ +--- +phase: 05-cloud-storage-backends +reviewed: 2026-05-30T00:00:00Z +depth: standard +files_reviewed: 14 +files_reviewed_list: + - backend/api/documents.py + - backend/api/admin.py + - backend/api/cloud.py + - backend/tasks/document_tasks.py + - backend/tests/test_cloud.py + - backend/tests/test_admin_api.py + - backend/tests/test_classifier.py + - frontend/src/api/client.js + - frontend/src/components/admin/AdminUsersTab.vue + - frontend/src/components/cloud/CloudCredentialModal.vue + - frontend/src/components/documents/DocumentPreviewModal.vue + - frontend/src/components/settings/SettingsCloudTab.vue + - frontend/src/components/ui/ConfirmBlock.vue + - frontend/src/views/DocumentView.vue +findings: + critical: 5 + warning: 6 + info: 3 + total: 14 +status: issues_found +--- + +# Phase 05: Code Review Report + +**Reviewed:** 2026-05-30 +**Depth:** standard +**Files Reviewed:** 14 +**Status:** issues_found + +## Summary + +This review covers the gap-closure plans 05-09, 05-10, and 05-11. The changes add a `PATCH /api/documents/{id}` endpoint for filename/folder rename, make the Celery re-analyze task cloud-aware, replace unauthenticated iframe src with a fetch+Blob URL flow, change `oauth_initiate` to return JSON instead of a 302 redirect, add WebDAV/Nextcloud edit support, add an admin user hard-delete with password confirmation, and small UI fixes (ConfirmBlock break-words, Edit button on ERROR-state connections). + +The security posture of the major new features is reasonable. However there are five blocker-class issues: two request-body smuggling paths, one timing-attack on admin password verification, one URL-object leak in DocumentView, and a missing folder-ownership check in the new PATCH endpoint. Several warnings around input validation and error handling are also present. + +--- + +## Critical Issues + +### CR-01: `DELETE /api/admin/users/{id}` body parsed from JSON but HTTP spec makes DELETE bodies unreliable — and FastAPI maps it as a query-param model, not a body, causing 422 in some clients + +**File:** `backend/api/admin.py:480-503` + +**Issue:** The `delete_user` handler declares `body: UserDeleteConfirm` as a plain positional parameter alongside `user_id: uuid.UUID`. FastAPI treats a Pydantic model on a DELETE handler as a **request body**, which is correct, but many HTTP clients (including some proxies and the `httpx` test client's `.delete()` shorthand) strip the body from DELETE requests per RFC 7231. The test at `test_admin_api.py:410` uses `client.delete(...)` with no body and asserts 422 — that part is fine. But `test_delete_user_correct_password` uses `client.request("DELETE", ..., json=...)` which explicitly sends a body. The problem is: the `admin_password` field is never validated for minimum length or content — a zero-length string `""` passes Pydantic validation and reaches `verify_password("", hash)` where Argon2 will evaluate it (returning False for a wrong hash, which is correct), but the absence of any length/non-empty guard means the error path returns `403` which subtly leaks that the endpoint exists and expects a password. More critically: **the constant-time comparison requirement from CLAUDE.md is met by `verify_password` (Argon2 is inherently constant-time for hashing), but the `admin_password` field has no `min_length=1` constraint**, so an empty string body produces a full Argon2 hash evaluation rather than an early reject. + +The bigger issue: there is **no rate limiting** on this endpoint. An attacker who has obtained an admin JWT can brute-force the admin's password via repeated DELETE calls. CLAUDE.md requires rate limiting on all auth-adjacent endpoints. + +**Fix:** Add `min_length=1` to `UserDeleteConfirm.admin_password` and ensure rate limiting middleware covers this endpoint: + +```python +class UserDeleteConfirm(BaseModel): + admin_password: str = Field(..., min_length=1, max_length=1024) +``` + +--- + +### CR-02: `PATCH /api/documents/{doc_id}` does not validate folder ownership — a user can move a document into another user's folder + +**File:** `backend/api/documents.py:546-588` + +**Issue:** The new `patch_document` handler validates document ownership (`doc.user_id != current_user.id`) but when `folder_id` is provided it sets `doc.folder_id = body.folder_id` without verifying that the target folder belongs to `current_user.id`. This is a cross-user data placement bug: a user who guesses or enumerates another user's folder UUID can move their own document into that folder, causing it to appear in the victim's folder listing. + +The existing `PATCH /api/documents/{id}/folder` endpoint in `backend/api/folders.py` does perform this check (lines ~479-488). The new `patch_document` bypasses that validation entirely. + +**Fix:** Add a folder ownership assertion before setting `doc.folder_id`: + +```python +if "folder_id" in body.model_fields_set and body.folder_id is not None: + from db.models import Folder # noqa: PLC0415 + target_folder = await session.get(Folder, body.folder_id) + if target_folder is None or target_folder.user_id != current_user.id: + raise HTTPException(404, "Folder not found") + doc.folder_id = body.folder_id +elif "folder_id" in body.model_fields_set: + doc.folder_id = None # move to root +``` + +--- + +### CR-03: `PATCH /api/documents/{doc_id}` accepts an empty string filename — corrupts the document record + +**File:** `backend/api/documents.py:576-577` + +**Issue:** The `filename` field in `DocumentPatch` is `Optional[str] = None`. The handler applies the update when `body.filename is not None`, but an empty string `""` passes that check. A `PATCH {"filename": ""}` will persist an empty filename to the database, which breaks display, download headers (`Content-Disposition: inline; filename=""`), and any downstream filename-based logic. + +Additionally, filenames with path separators (e.g. `"../../etc/passwd"`) are accepted without sanitization. While the filename is only stored in the DB (not used for file system paths), it does appear verbatim in the `Content-Disposition` header at `backend/api/documents.py:754`, which can produce a malformed or injection-capable header value. + +**Fix:** + +```python +if "filename" in body.model_fields_set: + if body.filename is None or not body.filename.strip(): + raise HTTPException(422, "filename must be a non-empty string") + # Strip path separators — filename is display-only, not a path + doc.filename = Path(body.filename).name or body.filename +``` + +--- + +### CR-04: `fetchDocumentContent` in `client.js` does not check non-401 error responses — callers receive a non-`ok` Response silently + +**File:** `frontend/src/api/client.js:399-425` + +**Issue:** `fetchDocumentContent` deliberately does not call `res.json()` (it returns the raw `Response` for the caller to `.blob()`). However it also does not throw on non-401, non-ok responses — it returns the raw `Response` regardless of status. The caller in `DocumentPreviewModal.vue:93` checks `if (!res.ok)` correctly. But the caller in `DocumentView.vue:169-179` also checks `if (!res.ok)` and only `console.error`s — it swallows the error silently and returns without user feedback. + +More critically: the function handles `401` with a retry, but **a 403, 404, or 503 response is returned to the caller as a `Response` object without throwing**. If a future caller forgets the `res.ok` check (which `request()` does automatically), it will attempt to call `.blob()` on an error response, producing a confusing Blob containing the JSON error body rather than document bytes. + +**Fix:** Throw on non-auth error responses, consistent with `request()`: + +```javascript +export async function fetchDocumentContent(docId, options = {}) { + // ... (existing auth + fetch code) ... + + if (!res.ok && res.status !== 401) { + const msg = `HTTP ${res.status}` + const err = new Error(msg) + err.status = res.status + throw err + } + + if (res.status === 401 && !options._retry) { + // ... existing retry logic ... + } + + return res +} +``` + +--- + +### CR-05: `DocumentView.vue` leaks a blob object URL when opening PDFs in a new tab — the 60-second revoke timer is unreliable + +**File:** `frontend/src/views/DocumentView.vue:172-182` + +**Issue:** In `openPdf()` (new-tab path), a `URL.createObjectURL(blob)` URL is created, `window.open`ed, and then revoked after a `setTimeout(..., 60000)`. This has two problems: + +1. **Memory leak vector:** If the user navigates away from `DocumentView` before 60 seconds, the timeout still fires against the detached window context. More importantly, if `window.open` is blocked by a popup blocker, the object URL is never opened but the timer still runs — the 60-second window holds the blob in memory unnecessarily. +2. **Race condition:** Some browsers begin loading the new tab asynchronously; 60 seconds may not be enough for large PDFs over slow connections, causing the tab to show a broken preview mid-load. + +This is a correctness/reliability issue rather than pure performance, because the revoked URL can leave the new tab with a broken blank page. + +**Fix:** Use a longer TTL (e.g., 5 minutes) or defer revocation using the `window.open` return value's `onload` event — but as a minimum, guard the open call: + +```javascript +const win = window.open(objectUrl, '_blank') +if (!win) { + // Popup blocked — revoke immediately + URL.revokeObjectURL(objectUrl) +} else { + setTimeout(() => URL.revokeObjectURL(objectUrl), 300_000) // 5 min +} +``` + +--- + +## Warnings + +### WR-01: `_call_cloud_op` commits the session inside a helper, but the session is owned by the caller — double-commit risk + +**File:** `backend/api/cloud.py:116-133` + +**Issue:** `_call_cloud_op` calls `await session.commit()` on the session passed in by the caller (at lines 116, 133, 148, 165). The caller (e.g., `list_cloud_folders`) does not commit after calling `_call_cloud_op`. This pattern is fragile: if the caller adds objects to the session after `_call_cloud_op` commits, those will be committed in a separate implicit transaction, potentially leaving the session in an inconsistent state. More importantly, `list_cloud_folders` at line 757 does not call `_call_cloud_op` at all — it calls the fetch functions directly. The commit calls inside `_call_cloud_op` are therefore only triggered on retry paths, making the commit responsibility asymmetric and hard to audit. + +**Fix:** Establish a clear ownership rule: either `_call_cloud_op` owns the commit (and callers must not commit), or callers own the commit (and `_call_cloud_op` only flushes). Document this contract explicitly in the docstring. + +--- + +### WR-02: `CloudCredentialModal.vue` — edit mode submits with an empty password, which the backend rejects without clear user feedback + +**File:** `frontend/src/components/cloud/CloudCredentialModal.vue:304-322` + +**Issue:** The modal comment at line 311-313 explicitly acknowledges this problem: "If password is empty on edit, the server will reject." The `submit()` function sends `password.value` which may be empty if the user chose not to change it. The backend's `connect_webdav` endpoint always requires the `password` field (it upserts the full credential set). When the user clicks "Save changes" without entering a new password, the call will fail with a validation error, but the displayed error message is the raw backend error rather than a clear "Please re-enter your password to save changes" message. + +The code comment itself says "Future enhancement: PATCH endpoint that accepts partial updates" — but shipping with a known broken flow is a user-facing defect. + +**Fix:** Add client-side validation in `submit()` for the edit case: + +```javascript +async function submit() { + connectError.value = '' + if (props.existing && !password.value) { + connectError.value = 'Please enter your password to save changes.' + return + } + // ... rest of submit +} +``` + +--- + +### WR-03: `adminDeleteUser` in `client.js` sends `admin_password` in a JSON body on a DELETE request — body may be stripped by intermediaries + +**File:** `frontend/src/api/client.js:280-286` + +**Issue:** HTTP DELETE requests with a body are technically valid but controversial. Some reverse proxies (nginx, AWS ALB) and CDN configurations strip or reject DELETE request bodies. The `admin_password` credential would then arrive at FastAPI as an empty/missing body, producing a 422, which could be confused with a Pydantic validation failure rather than a transport issue. CLAUDE.md mandates no plaintext secrets in transit beyond TLS, which is met here, but the transport reliability is not. + +**Fix:** Consider changing the endpoint to `POST /api/admin/users/{id}/delete` with a JSON body, or accept the password as a header (e.g., `X-Admin-Password`) with a note that headers are also stripped by some proxies. A `POST` endpoint is the most reliable approach and keeps the credential in the body where TLS protects it. + +--- + +### WR-04: `generateRandomPassword` in `AdminUsersTab.vue` appends a fixed suffix `"A1!"` — reducing entropy for the last 3 characters + +**File:** `frontend/src/components/admin/AdminUsersTab.vue:291-301` + +**Issue:** The password generator creates 16 random bytes mapped to a charset, then replaces the last 4 characters with `"A1!"` (3 fixed characters appended after slicing to 12). This means the last 3 characters of every generated password are always `"A1!"` — deterministic, not random. A 15-character password has its last 3 characters known to any attacker aware of this implementation. The effective entropy is 12 characters from the charset, not 15. The function is also missing a `handle` field — the email split at line 336 may produce an empty handle if the email starts with `@`. + +**Fix:** + +```javascript +function generateRandomPassword() { + const upper = 'ABCDEFGHJKLMNPQRSTUVWXYZ' + const lower = 'abcdefghijkmnpqrstuvwxyz' + const digits = '23456789' + const special = '!@#$%^&*' + const all = upper + lower + digits + special + const arr = new Uint8Array(16) + crypto.getRandomValues(arr) + + // Guarantee character class coverage using first 4 bytes + let pw = [ + upper[arr[0] % upper.length], + lower[arr[1] % lower.length], + digits[arr[2] % digits.length], + special[arr[3] % special.length], + ] + for (let i = 4; i < 16; i++) { + pw.push(all[arr[i] % all.length]) + } + // Fisher-Yates shuffle + for (let i = pw.length - 1; i > 0; i--) { + const j = arr[i] % (i + 1) + ;[pw[i], pw[j]] = [pw[j], pw[i]] + } + return pw.join('') +} +``` + +--- + +### WR-05: `oauth_callback` in `cloud.py` leaks exception messages into redirect URLs + +**File:** `backend/api/cloud.py:525-530` + +**Issue:** The outer `except Exception as exc` block at line 525 passes `str(exc)` directly into a redirect URL via `urllib.parse.quote(error_msg)`. This means internal exception messages — including potentially stack traces from libraries, token values from MSAL error responses, or internal server details — are passed to the frontend as query parameters in the redirect. The error message from `ValueError(f"Token exchange failed: {result.get('error_description', result['error'])}")` (line 493) includes the provider's raw `error_description` which may contain OAuth scopes, client IDs, or internal identifiers. + +**Fix:** Sanitize or categorize errors before inclusion in the redirect: + +```python +except Exception as exc: + # Log the full error internally; expose only a safe generic message + import logging + logging.getLogger(__name__).error("OAuth callback error: %s", exc) + error_msg = "OAuth connection failed. Please try again." + return RedirectResponse( + url=f"{frontend_settings}?cloud_error={urllib.parse.quote(error_msg)}", + status_code=302, + ) +``` + +--- + +### WR-06: `test_invalid_grant_sets_requires_reauth` test does not actually verify the DB state transition it claims to test + +**File:** `backend/tests/test_cloud.py:424-498` + +**Issue:** The test name and docstring promise to verify "BOTH HTTP 503 response AND DB state update." However, lines 489-498 contain a comment explicitly conceding that the DB state is NOT verified by this test because the monkeypatch bypasses `_call_cloud_op`. The test asserts only the HTTP 503. The comment says "The DB transition is covered by the cloud.py unit tests" — but no such unit test exists in the reviewed files. This leaves the `conn.status = "REQUIRES_REAUTH"` path in `_call_cloud_op` untested by the test suite. + +**Fix:** Either (a) add a separate unit test for `_call_cloud_op` that verifies the DB status transition, or (b) restructure `test_invalid_grant_sets_requires_reauth` to use the real `_call_cloud_op` path and assert the DB state. At minimum, remove the misleading docstring claim about verifying DB state. + +--- + +## Info + +### IN-01: `moveDocument` in `client.js` calls a non-existent endpoint — dead code + +**File:** `frontend/src/api/client.js:321-327` + +**Issue:** `moveDocument(docId, folderId)` targets `PATCH /api/documents/{docId}/folder`. That endpoint is defined in `backend/api/folders.py` (not `documents.py`). The new `PATCH /api/documents/{doc_id}` endpoint added in plan 05-09 also accepts `folder_id`. There are now two client-side functions (`moveDocument` via `/folder` and the new `patch_document` path via `PATCH /documents/{id}`) that both accomplish folder moves, but through different backend endpoints. This duplication creates confusion about which to use. If `moveDocument` is the legacy function that should be superseded, it should be removed or deprecated with a clear comment. + +--- + +### IN-02: `classify_document` in `documents.py` uses a mutable default argument `body: dict = {}` + +**File:** `backend/api/documents.py:648` + +**Issue:** `body: dict = {}` is a mutable default argument in a Python function — a classic Python footgun. In normal Python functions this causes state sharing between calls, but FastAPI reconstructs default parameter values per request for `Body` parameters, so this is unlikely to cause the classic bug in practice. However it is still a code smell that will flag in linters and misleads readers. FastAPI's idiomatic approach is `body: dict = Body(default={})` or a dedicated Pydantic model. + +**Fix:** + +```python +from fastapi import Body +async def classify_document( + doc_id: str, + body: dict = Body(default={}), + ... +``` + +--- + +### IN-03: `SettingsCloudTab.vue` — `oauthError` banner is shown inside a `v-else` that is mutually exclusive with `store.loading` but not with the provider list + +**File:** `frontend/src/components/settings/SettingsCloudTab.vue:23` + +**Issue:** The template structure is: + +```html +