1a6fa08a34
REVIEW.md: 3 critical findings fixed (HTTPException passthrough, Redis pre-flight ordering, CLOUD_CREDS_KEY in celery-worker env) VERIFICATION.md: 7/7 must-haves verified; 6 human-verification items require live cloud credentials (Google Drive, OneDrive, Nextcloud/WebDAV) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
330 lines
14 KiB
Markdown
330 lines
14 KiB
Markdown
---
|
|
phase: 05-cloud-storage-backends
|
|
reviewed: 2026-05-30T00:00:00Z
|
|
depth: standard
|
|
files_reviewed: 6
|
|
files_reviewed_list:
|
|
- backend/api/cloud.py
|
|
- backend/api/documents.py
|
|
- docker-compose.yml
|
|
- frontend/src/views/CloudStorageView.vue
|
|
- backend/tests/test_cloud.py
|
|
- backend/tests/test_documents.py
|
|
findings:
|
|
critical: 3
|
|
warning: 4
|
|
info: 2
|
|
total: 9
|
|
status: issues_found
|
|
---
|
|
|
|
# Phase 05 Plan 12: Code Review Report
|
|
|
|
**Reviewed:** 2026-05-30T00:00:00Z
|
|
**Depth:** standard
|
|
**Files Reviewed:** 6
|
|
**Status:** issues_found
|
|
|
|
## Summary
|
|
|
|
This review covers the plan 05-12 gap-closure changes: OAuth pre-flight config validation added
|
|
to `oauth_initiate`, a broad `except Exception → 502` fallback added after the
|
|
`except CloudConnectionError → 503` clause in `stream_document_content`, a Celery worker source
|
|
volume mount added to `docker-compose.yml`, an upload-hint paragraph added to
|
|
`CloudStorageView.vue`, two new pre-flight tests in `test_cloud.py`, and one new 502-path test
|
|
in `test_documents.py`.
|
|
|
|
Three critical issues were found. The most impactful is that the broad `except Exception` clause
|
|
added to `stream_document_content` unconditionally swallows `HTTPException` raised by
|
|
`get_storage_backend_for_document`, converting a correct 503 "reconnect" response into a
|
|
misleading 502 "unreachable" response. The second critical issue is that Redis state tokens are
|
|
written to Redis before the new pre-flight check runs, leaving one orphan state entry per
|
|
rejected OAuth initiation request. The third is that the Celery worker container is missing the
|
|
`CLOUD_CREDS_KEY` environment variable, which causes silent use of the fallback default key
|
|
`"CHANGEME-32-bytes-padded!!"` for cloud-document credential decryption, making every
|
|
extract-and-classify Celery task for cloud documents fail at runtime.
|
|
|
|
## Narrative Findings (AI reviewer)
|
|
|
|
---
|
|
|
|
## Critical Issues
|
|
|
|
### CR-01: `except Exception` in `stream_document_content` swallows `HTTPException` from `get_storage_backend_for_document`
|
|
|
|
**File:** `backend/api/documents.py:751-763`
|
|
|
|
**Issue:** `get_storage_backend_for_document` (in `storage/__init__.py:100-103`) raises
|
|
`HTTPException(503, "Cloud connection not found or inactive")` when no active `CloudConnection`
|
|
row exists for the document's provider. `HTTPException` is a subclass of `Exception`
|
|
(confirmed: `starlette.exceptions.HTTPException → Exception → BaseException`), so the new
|
|
`except Exception as exc` block on line 759 catches it and re-raises it wrapped in a new
|
|
`HTTPException(502, "Cloud backend unreachable …")`.
|
|
|
|
The caller receives a misleading 502 status and a "backend unreachable" message when the real
|
|
problem is that the cloud connection was deleted or set to `REQUIRES_REAUTH`. The correct 503
|
|
with the reconnect prompt is silently suppressed.
|
|
|
|
The new test `test_stream_document_content_cloud_backend_error` (test_documents.py:598-632) only
|
|
exercises the `RuntimeError` path by monkeypatching `get_storage_backend_for_document` to raise
|
|
a `RuntimeError`. It does not test the path where `get_storage_backend_for_document` raises
|
|
`HTTPException(503)`, so this regression is undetected by the test suite.
|
|
|
|
**Fix:** Re-order the `except` clauses to explicitly re-raise `HTTPException` before the broad
|
|
catch catches it:
|
|
|
|
```python
|
|
try:
|
|
storage_backend = await get_storage_backend_for_document(doc, current_user, session)
|
|
file_bytes = await storage_backend.get_object(doc.object_key)
|
|
except CloudConnectionError as exc:
|
|
raise HTTPException(
|
|
status_code=503,
|
|
detail="Cloud connection requires re-authentication. Please reconnect in Settings.",
|
|
) from exc
|
|
except HTTPException:
|
|
raise # propagate 503 from get_storage_backend_for_document unchanged
|
|
except Exception as exc:
|
|
raise HTTPException(
|
|
status_code=502,
|
|
detail="Cloud backend unreachable. Please try again or reconnect in Settings.",
|
|
) from exc
|
|
```
|
|
|
|
---
|
|
|
|
### CR-02: Redis OAuth state token written before pre-flight check — orphan Redis entries created on every rejected request
|
|
|
|
**File:** `backend/api/cloud.py:342-357`
|
|
|
|
**Issue:** In `oauth_initiate`, `redis_client.setex(f"oauth_state:{state_token}", 1800, …)` is
|
|
called on line 344, persisting a 30-minute Redis entry, before the provider-config pre-flight
|
|
checks on lines 348-357. When `google_client_id` or `onedrive_client_id` is empty, the function
|
|
raises `HTTPException(400)` and the state token is never consumed or deleted. Every rejected call
|
|
leaves one orphan Redis key with an 1800-second TTL.
|
|
|
|
In a misconfigured deployment (where OAuth credentials are not set), every authenticated user
|
|
clicking "Connect" generates a Redis key that is never reclaimed except by TTL expiry. Beyond
|
|
memory waste, an orphan state token created before the rejection could theoretically be captured
|
|
from server logs or monitoring and submitted to the callback endpoint if credentials are later
|
|
configured — allowing a replay of a stale initiation.
|
|
|
|
The two new tests (`test_oauth_initiate_google_drive_not_configured`,
|
|
`test_oauth_initiate_onedrive_not_configured`) verify the 400 response but do not assert that
|
|
the `FakeRedis._store` is empty, so the leak is undetected.
|
|
|
|
**Fix:** Move all pre-flight checks above the Redis write:
|
|
|
|
```python
|
|
@router.get("/oauth/initiate/{provider}")
|
|
async def oauth_initiate(provider: str, request: Request,
|
|
current_user: User = Depends(get_regular_user)) -> dict:
|
|
if provider not in VALID_OAUTH_PROVIDERS:
|
|
raise HTTPException(status_code=400, detail=f"Unsupported OAuth provider: {provider}.")
|
|
|
|
# Pre-flight BEFORE touching Redis
|
|
if provider == "google_drive" and (not settings.google_client_id or not settings.google_client_secret):
|
|
raise HTTPException(status_code=400, detail="…Set GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET…")
|
|
if provider == "onedrive" and (not settings.onedrive_client_id or not settings.onedrive_client_secret):
|
|
raise HTTPException(status_code=400, detail="…Set ONEDRIVE_CLIENT_ID, ONEDRIVE_CLIENT_SECRET…")
|
|
|
|
state_token = secrets.token_urlsafe(32)
|
|
redis_client = request.app.state.redis
|
|
await redis_client.setex(f"oauth_state:{state_token}", 1800, str(current_user.id))
|
|
…
|
|
```
|
|
|
|
---
|
|
|
|
### CR-03: `celery-worker` missing `CLOUD_CREDS_KEY` — cloud document processing silently uses wrong decryption key
|
|
|
|
**File:** `docker-compose.yml:81-102`
|
|
|
|
**Issue:** The `celery-worker` service environment block (lines 83-90) does not include
|
|
`CLOUD_CREDS_KEY`. Without this variable, `settings.cloud_creds_key` falls back to the default
|
|
`"CHANGEME-32-bytes-padded!!"` (config.py:61). The Celery task `_run` in
|
|
`tasks/document_tasks.py` calls `get_storage_backend_for_document`, which calls
|
|
`decrypt_credentials(settings.cloud_creds_key.encode(), str(user.id), conn.credentials_enc)`.
|
|
HKDF key derivation will silently use the wrong master key, Fernet will raise
|
|
`InvalidToken`, and the task returns `{"status": "extract_failed", "error": "retrieval failed: …"}`.
|
|
There is no startup-time validation; the failure only surfaces on the first cloud document
|
|
task execution.
|
|
|
|
The `backend` service correctly receives `SECRET_KEY` (line 64) and would receive `CLOUD_CREDS_KEY`
|
|
from the environment, but the `celery-worker` service does not pass either.
|
|
|
|
**Fix:**
|
|
|
|
```yaml
|
|
celery-worker:
|
|
environment:
|
|
- DATABASE_URL=${DATABASE_URL}
|
|
- MINIO_ENDPOINT=${MINIO_ENDPOINT}
|
|
- MINIO_ACCESS_KEY=${MINIO_ACCESS_KEY}
|
|
- MINIO_SECRET_KEY=${MINIO_SECRET_KEY}
|
|
- MINIO_BUCKET=${MINIO_BUCKET}
|
|
- REDIS_URL=${REDIS_URL}
|
|
- CLOUD_CREDS_KEY=${CLOUD_CREDS_KEY} # required for cloud document credential decryption
|
|
- PYTHONDONTWRITEBYTECODE=1
|
|
```
|
|
|
|
---
|
|
|
|
## Warnings
|
|
|
|
### WR-01: `update_default_storage` accepts arbitrary string as `backend` value — no server-side allowlist
|
|
|
|
**File:** `backend/api/cloud.py:922-941`
|
|
|
|
**Issue:** `PATCH /api/users/me/default-storage` accepts `{"backend": "<any string>"}` and
|
|
writes it directly to `user.default_storage_backend` without validation against an allowlist.
|
|
The docstring notes "validated by the frontend dropdown," which is a client-side-only control
|
|
trivially bypassed. A user can persist any string (e.g., `"../../etc"`, unsupported provider
|
|
slug, or an empty string) to the DB column, potentially causing downstream handler errors when
|
|
the value is used for routing.
|
|
|
|
**Fix:** Add a `field_validator` to `DefaultStorageRequest`:
|
|
|
|
```python
|
|
_VALID_BACKENDS = frozenset({"minio", "google_drive", "onedrive", "nextcloud", "webdav"})
|
|
|
|
class DefaultStorageRequest(BaseModel):
|
|
backend: str
|
|
|
|
@field_validator("backend")
|
|
@classmethod
|
|
def backend_must_be_valid(cls, v: str) -> str:
|
|
if v not in _VALID_BACKENDS:
|
|
raise ValueError(f"backend must be one of {sorted(_VALID_BACKENDS)}")
|
|
return v
|
|
```
|
|
|
|
---
|
|
|
|
### WR-02: Pre-flight check for OneDrive omits `onedrive_tenant_id` validation despite advertising it in the error message
|
|
|
|
**File:** `backend/api/cloud.py:353-357`
|
|
|
|
**Issue:** The OneDrive pre-flight guard checks only `onedrive_client_id` and
|
|
`onedrive_client_secret`. Its error detail tells the operator to set `ONEDRIVE_TENANT_ID`, but
|
|
the code never checks whether `settings.onedrive_tenant_id` is empty. The default value is
|
|
`"common"` (config.py:67), so this is rarely a problem in practice. However, if someone
|
|
explicitly sets `ONEDRIVE_TENANT_ID=""`, the MSAL authority URL becomes
|
|
`https://login.microsoftonline.com//oauth2/v2.0/token`, producing an MSAL runtime error after
|
|
the pre-flight is supposed to have caught the misconfiguration.
|
|
|
|
**Fix:**
|
|
|
|
```python
|
|
if provider == "onedrive" and (
|
|
not settings.onedrive_client_id
|
|
or not settings.onedrive_client_secret
|
|
or not settings.onedrive_tenant_id
|
|
):
|
|
raise HTTPException(
|
|
status_code=400,
|
|
detail="OneDrive OAuth is not configured. Set ONEDRIVE_CLIENT_ID, ONEDRIVE_CLIENT_SECRET, and ONEDRIVE_TENANT_ID.",
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
### WR-03: New pre-flight tests do not assert Redis state is clean after a 400 response
|
|
|
|
**File:** `backend/tests/test_cloud.py:784-835`
|
|
|
|
**Issue:** `test_oauth_initiate_google_drive_not_configured` and
|
|
`test_oauth_initiate_onedrive_not_configured` both set up a `FakeRedis`, call the endpoint
|
|
expecting a 400, and reset `app.state.redis = None`. Neither test asserts that
|
|
`fake_redis._store` is empty after the call. Because the state token is currently written before
|
|
the pre-flight check (CR-02 above), a check like this would fail today — confirming the bug.
|
|
When CR-02 is fixed, adding the assertion hardens the test against regressions:
|
|
|
|
```python
|
|
# After the status assert, add:
|
|
assert len(fake_redis._store) == 0, (
|
|
"No OAuth state should be stored in Redis when pre-flight validation fails"
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
### WR-04: `oauth_callback` reflects raw OAuth provider `error` parameter and internal exception messages into redirect URL
|
|
|
|
**File:** `backend/api/cloud.py:427-428` and `537-539`
|
|
|
|
**Issue:** `error_param` from the query string is embedded verbatim into a `ValueError` message
|
|
(`f"OAuth provider returned error: {error_param}"`), which flows into `str(exc)` and is passed
|
|
to `urllib.parse.quote` before appearing as `?cloud_error=…` in the redirect. The URL encoding
|
|
prevents injection in the query string. However:
|
|
|
|
1. A malicious or compromised OAuth provider can inject arbitrary text into the user-visible
|
|
error banner with no server-side length cap or character filter.
|
|
2. The outer `except Exception` block at line 536 passes `str(exc)` for all internal errors,
|
|
which may include stack trace fragments, OAuth client IDs, or token values from provider
|
|
error responses (e.g., `ValueError(f"Token exchange failed: {result.get('error_description', result['error'])}")`
|
|
at line 504 — `error_description` is provider-controlled).
|
|
|
|
**Fix:** Cap the length and filter the error message before reflecting it:
|
|
|
|
```python
|
|
except Exception as exc:
|
|
import logging
|
|
logging.getLogger(__name__).error("OAuth callback error: %s", exc)
|
|
error_msg = "OAuth connection failed. Please try again."
|
|
return RedirectResponse(
|
|
url=f"{frontend_settings}?cloud_error={urllib.parse.quote(error_msg)}",
|
|
status_code=302,
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
## Info
|
|
|
|
### IN-01: `CloudStorageView.vue` does not fetch connections on mount — direct navigation shows stale empty state
|
|
|
|
**File:** `frontend/src/views/CloudStorageView.vue:61-93`
|
|
|
|
**Issue:** The component reads `cloudStore.connections` and `cloudStore.loading` reactively but
|
|
never calls `cloudStore.fetchConnections()` (or equivalent) in an `onMounted` hook. If a user
|
|
navigates directly to `/cloud` without first visiting a page that pre-populates the store, the
|
|
component renders the "No cloud storage connected" empty state without fetching live data. This
|
|
is a reliability gap for direct navigation and deep-link scenarios.
|
|
|
|
**Fix:** Add `onMounted`:
|
|
|
|
```javascript
|
|
import { computed, onMounted } from 'vue'
|
|
onMounted(() => { cloudStore.fetchConnections?.() })
|
|
```
|
|
|
|
or document explicitly that the parent layout is responsible for pre-fetching.
|
|
|
|
---
|
|
|
|
### IN-02: `classify_document` uses mutable default argument `body: dict = {}`
|
|
|
|
**File:** `backend/api/documents.py:657`
|
|
|
|
**Issue:** `body: dict = {}` is the classic Python mutable-default-argument anti-pattern.
|
|
FastAPI reconstructs body parameters per request so the classic shared-state bug does not
|
|
manifest in production, but static analysis tools (ruff B006, mypy) flag it, and calling the
|
|
function directly from tests with no `body` argument risks state sharing if the function is
|
|
ever modified. Use `None` as the sentinel:
|
|
|
|
```python
|
|
async def classify_document(
|
|
doc_id: str,
|
|
body: Optional[dict] = None,
|
|
…
|
|
):
|
|
topic_names = body.get("topics") if body else None
|
|
```
|
|
|
|
---
|
|
|
|
_Reviewed: 2026-05-30T00:00:00Z_
|
|
_Reviewer: Claude (gsd-code-reviewer)_
|
|
_Depth: standard_
|