Files
curo1305 1a6fa08a34 docs(05): add code review and verification reports for phase 5
REVIEW.md: 3 critical findings fixed (HTTPException passthrough,
Redis pre-flight ordering, CLOUD_CREDS_KEY in celery-worker env)
VERIFICATION.md: 7/7 must-haves verified; 6 human-verification items
require live cloud credentials (Google Drive, OneDrive, Nextcloud/WebDAV)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 18:07:42 +02:00

330 lines
14 KiB
Markdown

---
phase: 05-cloud-storage-backends
reviewed: 2026-05-30T00:00:00Z
depth: standard
files_reviewed: 6
files_reviewed_list:
- backend/api/cloud.py
- backend/api/documents.py
- docker-compose.yml
- frontend/src/views/CloudStorageView.vue
- backend/tests/test_cloud.py
- backend/tests/test_documents.py
findings:
critical: 3
warning: 4
info: 2
total: 9
status: issues_found
---
# Phase 05 Plan 12: Code Review Report
**Reviewed:** 2026-05-30T00:00:00Z
**Depth:** standard
**Files Reviewed:** 6
**Status:** issues_found
## Summary
This review covers the plan 05-12 gap-closure changes: OAuth pre-flight config validation added
to `oauth_initiate`, a broad `except Exception → 502` fallback added after the
`except CloudConnectionError → 503` clause in `stream_document_content`, a Celery worker source
volume mount added to `docker-compose.yml`, an upload-hint paragraph added to
`CloudStorageView.vue`, two new pre-flight tests in `test_cloud.py`, and one new 502-path test
in `test_documents.py`.
Three critical issues were found. The most impactful is that the broad `except Exception` clause
added to `stream_document_content` unconditionally swallows `HTTPException` raised by
`get_storage_backend_for_document`, converting a correct 503 "reconnect" response into a
misleading 502 "unreachable" response. The second critical issue is that Redis state tokens are
written to Redis before the new pre-flight check runs, leaving one orphan state entry per
rejected OAuth initiation request. The third is that the Celery worker container is missing the
`CLOUD_CREDS_KEY` environment variable, which causes silent use of the fallback default key
`"CHANGEME-32-bytes-padded!!"` for cloud-document credential decryption, making every
extract-and-classify Celery task for cloud documents fail at runtime.
## Narrative Findings (AI reviewer)
---
## Critical Issues
### CR-01: `except Exception` in `stream_document_content` swallows `HTTPException` from `get_storage_backend_for_document`
**File:** `backend/api/documents.py:751-763`
**Issue:** `get_storage_backend_for_document` (in `storage/__init__.py:100-103`) raises
`HTTPException(503, "Cloud connection not found or inactive")` when no active `CloudConnection`
row exists for the document's provider. `HTTPException` is a subclass of `Exception`
(confirmed: `starlette.exceptions.HTTPException → Exception → BaseException`), so the new
`except Exception as exc` block on line 759 catches it and re-raises it wrapped in a new
`HTTPException(502, "Cloud backend unreachable …")`.
The caller receives a misleading 502 status and a "backend unreachable" message when the real
problem is that the cloud connection was deleted or set to `REQUIRES_REAUTH`. The correct 503
with the reconnect prompt is silently suppressed.
The new test `test_stream_document_content_cloud_backend_error` (test_documents.py:598-632) only
exercises the `RuntimeError` path by monkeypatching `get_storage_backend_for_document` to raise
a `RuntimeError`. It does not test the path where `get_storage_backend_for_document` raises
`HTTPException(503)`, so this regression is undetected by the test suite.
**Fix:** Re-order the `except` clauses to explicitly re-raise `HTTPException` before the broad
catch catches it:
```python
try:
storage_backend = await get_storage_backend_for_document(doc, current_user, session)
file_bytes = await storage_backend.get_object(doc.object_key)
except CloudConnectionError as exc:
raise HTTPException(
status_code=503,
detail="Cloud connection requires re-authentication. Please reconnect in Settings.",
) from exc
except HTTPException:
raise # propagate 503 from get_storage_backend_for_document unchanged
except Exception as exc:
raise HTTPException(
status_code=502,
detail="Cloud backend unreachable. Please try again or reconnect in Settings.",
) from exc
```
---
### CR-02: Redis OAuth state token written before pre-flight check — orphan Redis entries created on every rejected request
**File:** `backend/api/cloud.py:342-357`
**Issue:** In `oauth_initiate`, `redis_client.setex(f"oauth_state:{state_token}", 1800, …)` is
called on line 344, persisting a 30-minute Redis entry, before the provider-config pre-flight
checks on lines 348-357. When `google_client_id` or `onedrive_client_id` is empty, the function
raises `HTTPException(400)` and the state token is never consumed or deleted. Every rejected call
leaves one orphan Redis key with an 1800-second TTL.
In a misconfigured deployment (where OAuth credentials are not set), every authenticated user
clicking "Connect" generates a Redis key that is never reclaimed except by TTL expiry. Beyond
memory waste, an orphan state token created before the rejection could theoretically be captured
from server logs or monitoring and submitted to the callback endpoint if credentials are later
configured — allowing a replay of a stale initiation.
The two new tests (`test_oauth_initiate_google_drive_not_configured`,
`test_oauth_initiate_onedrive_not_configured`) verify the 400 response but do not assert that
the `FakeRedis._store` is empty, so the leak is undetected.
**Fix:** Move all pre-flight checks above the Redis write:
```python
@router.get("/oauth/initiate/{provider}")
async def oauth_initiate(provider: str, request: Request,
current_user: User = Depends(get_regular_user)) -> dict:
if provider not in VALID_OAUTH_PROVIDERS:
raise HTTPException(status_code=400, detail=f"Unsupported OAuth provider: {provider}.")
# Pre-flight BEFORE touching Redis
if provider == "google_drive" and (not settings.google_client_id or not settings.google_client_secret):
raise HTTPException(status_code=400, detail="…Set GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET…")
if provider == "onedrive" and (not settings.onedrive_client_id or not settings.onedrive_client_secret):
raise HTTPException(status_code=400, detail="…Set ONEDRIVE_CLIENT_ID, ONEDRIVE_CLIENT_SECRET…")
state_token = secrets.token_urlsafe(32)
redis_client = request.app.state.redis
await redis_client.setex(f"oauth_state:{state_token}", 1800, str(current_user.id))
```
---
### CR-03: `celery-worker` missing `CLOUD_CREDS_KEY` — cloud document processing silently uses wrong decryption key
**File:** `docker-compose.yml:81-102`
**Issue:** The `celery-worker` service environment block (lines 83-90) does not include
`CLOUD_CREDS_KEY`. Without this variable, `settings.cloud_creds_key` falls back to the default
`"CHANGEME-32-bytes-padded!!"` (config.py:61). The Celery task `_run` in
`tasks/document_tasks.py` calls `get_storage_backend_for_document`, which calls
`decrypt_credentials(settings.cloud_creds_key.encode(), str(user.id), conn.credentials_enc)`.
HKDF key derivation will silently use the wrong master key, Fernet will raise
`InvalidToken`, and the task returns `{"status": "extract_failed", "error": "retrieval failed: …"}`.
There is no startup-time validation; the failure only surfaces on the first cloud document
task execution.
The `backend` service correctly receives `SECRET_KEY` (line 64) and would receive `CLOUD_CREDS_KEY`
from the environment, but the `celery-worker` service does not pass either.
**Fix:**
```yaml
celery-worker:
environment:
- DATABASE_URL=${DATABASE_URL}
- MINIO_ENDPOINT=${MINIO_ENDPOINT}
- MINIO_ACCESS_KEY=${MINIO_ACCESS_KEY}
- MINIO_SECRET_KEY=${MINIO_SECRET_KEY}
- MINIO_BUCKET=${MINIO_BUCKET}
- REDIS_URL=${REDIS_URL}
- CLOUD_CREDS_KEY=${CLOUD_CREDS_KEY} # required for cloud document credential decryption
- PYTHONDONTWRITEBYTECODE=1
```
---
## Warnings
### WR-01: `update_default_storage` accepts arbitrary string as `backend` value — no server-side allowlist
**File:** `backend/api/cloud.py:922-941`
**Issue:** `PATCH /api/users/me/default-storage` accepts `{"backend": "<any string>"}` and
writes it directly to `user.default_storage_backend` without validation against an allowlist.
The docstring notes "validated by the frontend dropdown," which is a client-side-only control
trivially bypassed. A user can persist any string (e.g., `"../../etc"`, unsupported provider
slug, or an empty string) to the DB column, potentially causing downstream handler errors when
the value is used for routing.
**Fix:** Add a `field_validator` to `DefaultStorageRequest`:
```python
_VALID_BACKENDS = frozenset({"minio", "google_drive", "onedrive", "nextcloud", "webdav"})
class DefaultStorageRequest(BaseModel):
backend: str
@field_validator("backend")
@classmethod
def backend_must_be_valid(cls, v: str) -> str:
if v not in _VALID_BACKENDS:
raise ValueError(f"backend must be one of {sorted(_VALID_BACKENDS)}")
return v
```
---
### WR-02: Pre-flight check for OneDrive omits `onedrive_tenant_id` validation despite advertising it in the error message
**File:** `backend/api/cloud.py:353-357`
**Issue:** The OneDrive pre-flight guard checks only `onedrive_client_id` and
`onedrive_client_secret`. Its error detail tells the operator to set `ONEDRIVE_TENANT_ID`, but
the code never checks whether `settings.onedrive_tenant_id` is empty. The default value is
`"common"` (config.py:67), so this is rarely a problem in practice. However, if someone
explicitly sets `ONEDRIVE_TENANT_ID=""`, the MSAL authority URL becomes
`https://login.microsoftonline.com//oauth2/v2.0/token`, producing an MSAL runtime error after
the pre-flight is supposed to have caught the misconfiguration.
**Fix:**
```python
if provider == "onedrive" and (
not settings.onedrive_client_id
or not settings.onedrive_client_secret
or not settings.onedrive_tenant_id
):
raise HTTPException(
status_code=400,
detail="OneDrive OAuth is not configured. Set ONEDRIVE_CLIENT_ID, ONEDRIVE_CLIENT_SECRET, and ONEDRIVE_TENANT_ID.",
)
```
---
### WR-03: New pre-flight tests do not assert Redis state is clean after a 400 response
**File:** `backend/tests/test_cloud.py:784-835`
**Issue:** `test_oauth_initiate_google_drive_not_configured` and
`test_oauth_initiate_onedrive_not_configured` both set up a `FakeRedis`, call the endpoint
expecting a 400, and reset `app.state.redis = None`. Neither test asserts that
`fake_redis._store` is empty after the call. Because the state token is currently written before
the pre-flight check (CR-02 above), a check like this would fail today — confirming the bug.
When CR-02 is fixed, adding the assertion hardens the test against regressions:
```python
# After the status assert, add:
assert len(fake_redis._store) == 0, (
"No OAuth state should be stored in Redis when pre-flight validation fails"
)
```
---
### WR-04: `oauth_callback` reflects raw OAuth provider `error` parameter and internal exception messages into redirect URL
**File:** `backend/api/cloud.py:427-428` and `537-539`
**Issue:** `error_param` from the query string is embedded verbatim into a `ValueError` message
(`f"OAuth provider returned error: {error_param}"`), which flows into `str(exc)` and is passed
to `urllib.parse.quote` before appearing as `?cloud_error=…` in the redirect. The URL encoding
prevents injection in the query string. However:
1. A malicious or compromised OAuth provider can inject arbitrary text into the user-visible
error banner with no server-side length cap or character filter.
2. The outer `except Exception` block at line 536 passes `str(exc)` for all internal errors,
which may include stack trace fragments, OAuth client IDs, or token values from provider
error responses (e.g., `ValueError(f"Token exchange failed: {result.get('error_description', result['error'])}")`
at line 504 — `error_description` is provider-controlled).
**Fix:** Cap the length and filter the error message before reflecting it:
```python
except Exception as exc:
import logging
logging.getLogger(__name__).error("OAuth callback error: %s", exc)
error_msg = "OAuth connection failed. Please try again."
return RedirectResponse(
url=f"{frontend_settings}?cloud_error={urllib.parse.quote(error_msg)}",
status_code=302,
)
```
---
## Info
### IN-01: `CloudStorageView.vue` does not fetch connections on mount — direct navigation shows stale empty state
**File:** `frontend/src/views/CloudStorageView.vue:61-93`
**Issue:** The component reads `cloudStore.connections` and `cloudStore.loading` reactively but
never calls `cloudStore.fetchConnections()` (or equivalent) in an `onMounted` hook. If a user
navigates directly to `/cloud` without first visiting a page that pre-populates the store, the
component renders the "No cloud storage connected" empty state without fetching live data. This
is a reliability gap for direct navigation and deep-link scenarios.
**Fix:** Add `onMounted`:
```javascript
import { computed, onMounted } from 'vue'
onMounted(() => { cloudStore.fetchConnections?.() })
```
or document explicitly that the parent layout is responsible for pre-fetching.
---
### IN-02: `classify_document` uses mutable default argument `body: dict = {}`
**File:** `backend/api/documents.py:657`
**Issue:** `body: dict = {}` is the classic Python mutable-default-argument anti-pattern.
FastAPI reconstructs body parameters per request so the classic shared-state bug does not
manifest in production, but static analysis tools (ruff B006, mypy) flag it, and calling the
function directly from tests with no `body` argument risks state sharing if the function is
ever modified. Use `None` as the sentinel:
```python
async def classify_document(
doc_id: str,
body: Optional[dict] = None,
):
topic_names = body.get("topics") if body else None
```
---
_Reviewed: 2026-05-30T00:00:00Z_
_Reviewer: Claude (gsd-code-reviewer)_
_Depth: standard_