Files
kite/.planning/phases/05-cloud-storage-backends/05-REVIEW.md
T
curo1305 1a6fa08a34 docs(05): add code review and verification reports for phase 5
REVIEW.md: 3 critical findings fixed (HTTPException passthrough,
Redis pre-flight ordering, CLOUD_CREDS_KEY in celery-worker env)
VERIFICATION.md: 7/7 must-haves verified; 6 human-verification items
require live cloud credentials (Google Drive, OneDrive, Nextcloud/WebDAV)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 18:07:42 +02:00

14 KiB

phase, reviewed, depth, files_reviewed, files_reviewed_list, findings, status
phase reviewed depth files_reviewed files_reviewed_list findings status
05-cloud-storage-backends 2026-05-30T00:00:00Z standard 6
backend/api/cloud.py
backend/api/documents.py
docker-compose.yml
frontend/src/views/CloudStorageView.vue
backend/tests/test_cloud.py
backend/tests/test_documents.py
critical warning info total
3 4 2 9
issues_found

Phase 05 Plan 12: Code Review Report

Reviewed: 2026-05-30T00:00:00Z Depth: standard Files Reviewed: 6 Status: issues_found

Summary

This review covers the plan 05-12 gap-closure changes: OAuth pre-flight config validation added to oauth_initiate, a broad except Exception → 502 fallback added after the except CloudConnectionError → 503 clause in stream_document_content, a Celery worker source volume mount added to docker-compose.yml, an upload-hint paragraph added to CloudStorageView.vue, two new pre-flight tests in test_cloud.py, and one new 502-path test in test_documents.py.

Three critical issues were found. The most impactful is that the broad except Exception clause added to stream_document_content unconditionally swallows HTTPException raised by get_storage_backend_for_document, converting a correct 503 "reconnect" response into a misleading 502 "unreachable" response. The second critical issue is that Redis state tokens are written to Redis before the new pre-flight check runs, leaving one orphan state entry per rejected OAuth initiation request. The third is that the Celery worker container is missing the CLOUD_CREDS_KEY environment variable, which causes silent use of the fallback default key "CHANGEME-32-bytes-padded!!" for cloud-document credential decryption, making every extract-and-classify Celery task for cloud documents fail at runtime.

Narrative Findings (AI reviewer)


Critical Issues

CR-01: except Exception in stream_document_content swallows HTTPException from get_storage_backend_for_document

File: backend/api/documents.py:751-763

Issue: get_storage_backend_for_document (in storage/__init__.py:100-103) raises HTTPException(503, "Cloud connection not found or inactive") when no active CloudConnection row exists for the document's provider. HTTPException is a subclass of Exception (confirmed: starlette.exceptions.HTTPException → Exception → BaseException), so the new except Exception as exc block on line 759 catches it and re-raises it wrapped in a new HTTPException(502, "Cloud backend unreachable …").

The caller receives a misleading 502 status and a "backend unreachable" message when the real problem is that the cloud connection was deleted or set to REQUIRES_REAUTH. The correct 503 with the reconnect prompt is silently suppressed.

The new test test_stream_document_content_cloud_backend_error (test_documents.py:598-632) only exercises the RuntimeError path by monkeypatching get_storage_backend_for_document to raise a RuntimeError. It does not test the path where get_storage_backend_for_document raises HTTPException(503), so this regression is undetected by the test suite.

Fix: Re-order the except clauses to explicitly re-raise HTTPException before the broad catch catches it:

    try:
        storage_backend = await get_storage_backend_for_document(doc, current_user, session)
        file_bytes = await storage_backend.get_object(doc.object_key)
    except CloudConnectionError as exc:
        raise HTTPException(
            status_code=503,
            detail="Cloud connection requires re-authentication. Please reconnect in Settings.",
        ) from exc
    except HTTPException:
        raise  # propagate 503 from get_storage_backend_for_document unchanged
    except Exception as exc:
        raise HTTPException(
            status_code=502,
            detail="Cloud backend unreachable. Please try again or reconnect in Settings.",
        ) from exc

CR-02: Redis OAuth state token written before pre-flight check — orphan Redis entries created on every rejected request

File: backend/api/cloud.py:342-357

Issue: In oauth_initiate, redis_client.setex(f"oauth_state:{state_token}", 1800, …) is called on line 344, persisting a 30-minute Redis entry, before the provider-config pre-flight checks on lines 348-357. When google_client_id or onedrive_client_id is empty, the function raises HTTPException(400) and the state token is never consumed or deleted. Every rejected call leaves one orphan Redis key with an 1800-second TTL.

In a misconfigured deployment (where OAuth credentials are not set), every authenticated user clicking "Connect" generates a Redis key that is never reclaimed except by TTL expiry. Beyond memory waste, an orphan state token created before the rejection could theoretically be captured from server logs or monitoring and submitted to the callback endpoint if credentials are later configured — allowing a replay of a stale initiation.

The two new tests (test_oauth_initiate_google_drive_not_configured, test_oauth_initiate_onedrive_not_configured) verify the 400 response but do not assert that the FakeRedis._store is empty, so the leak is undetected.

Fix: Move all pre-flight checks above the Redis write:

@router.get("/oauth/initiate/{provider}")
async def oauth_initiate(provider: str, request: Request,
                         current_user: User = Depends(get_regular_user)) -> dict:
    if provider not in VALID_OAUTH_PROVIDERS:
        raise HTTPException(status_code=400, detail=f"Unsupported OAuth provider: {provider}.")

    # Pre-flight BEFORE touching Redis
    if provider == "google_drive" and (not settings.google_client_id or not settings.google_client_secret):
        raise HTTPException(status_code=400, detail="…Set GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET…")
    if provider == "onedrive" and (not settings.onedrive_client_id or not settings.onedrive_client_secret):
        raise HTTPException(status_code=400, detail="…Set ONEDRIVE_CLIENT_ID, ONEDRIVE_CLIENT_SECRET…")

    state_token = secrets.token_urlsafe(32)
    redis_client = request.app.state.redis
    await redis_client.setex(f"oauth_state:{state_token}", 1800, str(current_user.id))
    

CR-03: celery-worker missing CLOUD_CREDS_KEY — cloud document processing silently uses wrong decryption key

File: docker-compose.yml:81-102

Issue: The celery-worker service environment block (lines 83-90) does not include CLOUD_CREDS_KEY. Without this variable, settings.cloud_creds_key falls back to the default "CHANGEME-32-bytes-padded!!" (config.py:61). The Celery task _run in tasks/document_tasks.py calls get_storage_backend_for_document, which calls decrypt_credentials(settings.cloud_creds_key.encode(), str(user.id), conn.credentials_enc). HKDF key derivation will silently use the wrong master key, Fernet will raise InvalidToken, and the task returns {"status": "extract_failed", "error": "retrieval failed: …"}. There is no startup-time validation; the failure only surfaces on the first cloud document task execution.

The backend service correctly receives SECRET_KEY (line 64) and would receive CLOUD_CREDS_KEY from the environment, but the celery-worker service does not pass either.

Fix:

  celery-worker:
    environment:
      - DATABASE_URL=${DATABASE_URL}
      - MINIO_ENDPOINT=${MINIO_ENDPOINT}
      - MINIO_ACCESS_KEY=${MINIO_ACCESS_KEY}
      - MINIO_SECRET_KEY=${MINIO_SECRET_KEY}
      - MINIO_BUCKET=${MINIO_BUCKET}
      - REDIS_URL=${REDIS_URL}
      - CLOUD_CREDS_KEY=${CLOUD_CREDS_KEY}    # required for cloud document credential decryption
      - PYTHONDONTWRITEBYTECODE=1

Warnings

WR-01: update_default_storage accepts arbitrary string as backend value — no server-side allowlist

File: backend/api/cloud.py:922-941

Issue: PATCH /api/users/me/default-storage accepts {"backend": "<any string>"} and writes it directly to user.default_storage_backend without validation against an allowlist. The docstring notes "validated by the frontend dropdown," which is a client-side-only control trivially bypassed. A user can persist any string (e.g., "../../etc", unsupported provider slug, or an empty string) to the DB column, potentially causing downstream handler errors when the value is used for routing.

Fix: Add a field_validator to DefaultStorageRequest:

_VALID_BACKENDS = frozenset({"minio", "google_drive", "onedrive", "nextcloud", "webdav"})

class DefaultStorageRequest(BaseModel):
    backend: str

    @field_validator("backend")
    @classmethod
    def backend_must_be_valid(cls, v: str) -> str:
        if v not in _VALID_BACKENDS:
            raise ValueError(f"backend must be one of {sorted(_VALID_BACKENDS)}")
        return v

WR-02: Pre-flight check for OneDrive omits onedrive_tenant_id validation despite advertising it in the error message

File: backend/api/cloud.py:353-357

Issue: The OneDrive pre-flight guard checks only onedrive_client_id and onedrive_client_secret. Its error detail tells the operator to set ONEDRIVE_TENANT_ID, but the code never checks whether settings.onedrive_tenant_id is empty. The default value is "common" (config.py:67), so this is rarely a problem in practice. However, if someone explicitly sets ONEDRIVE_TENANT_ID="", the MSAL authority URL becomes https://login.microsoftonline.com//oauth2/v2.0/token, producing an MSAL runtime error after the pre-flight is supposed to have caught the misconfiguration.

Fix:

if provider == "onedrive" and (
    not settings.onedrive_client_id
    or not settings.onedrive_client_secret
    or not settings.onedrive_tenant_id
):
    raise HTTPException(
        status_code=400,
        detail="OneDrive OAuth is not configured. Set ONEDRIVE_CLIENT_ID, ONEDRIVE_CLIENT_SECRET, and ONEDRIVE_TENANT_ID.",
    )

WR-03: New pre-flight tests do not assert Redis state is clean after a 400 response

File: backend/tests/test_cloud.py:784-835

Issue: test_oauth_initiate_google_drive_not_configured and test_oauth_initiate_onedrive_not_configured both set up a FakeRedis, call the endpoint expecting a 400, and reset app.state.redis = None. Neither test asserts that fake_redis._store is empty after the call. Because the state token is currently written before the pre-flight check (CR-02 above), a check like this would fail today — confirming the bug. When CR-02 is fixed, adding the assertion hardens the test against regressions:

# After the status assert, add:
assert len(fake_redis._store) == 0, (
    "No OAuth state should be stored in Redis when pre-flight validation fails"
)

WR-04: oauth_callback reflects raw OAuth provider error parameter and internal exception messages into redirect URL

File: backend/api/cloud.py:427-428 and 537-539

Issue: error_param from the query string is embedded verbatim into a ValueError message (f"OAuth provider returned error: {error_param}"), which flows into str(exc) and is passed to urllib.parse.quote before appearing as ?cloud_error=… in the redirect. The URL encoding prevents injection in the query string. However:

  1. A malicious or compromised OAuth provider can inject arbitrary text into the user-visible error banner with no server-side length cap or character filter.
  2. The outer except Exception block at line 536 passes str(exc) for all internal errors, which may include stack trace fragments, OAuth client IDs, or token values from provider error responses (e.g., ValueError(f"Token exchange failed: {result.get('error_description', result['error'])}") at line 504 — error_description is provider-controlled).

Fix: Cap the length and filter the error message before reflecting it:

except Exception as exc:
    import logging
    logging.getLogger(__name__).error("OAuth callback error: %s", exc)
    error_msg = "OAuth connection failed. Please try again."
    return RedirectResponse(
        url=f"{frontend_settings}?cloud_error={urllib.parse.quote(error_msg)}",
        status_code=302,
    )

Info

IN-01: CloudStorageView.vue does not fetch connections on mount — direct navigation shows stale empty state

File: frontend/src/views/CloudStorageView.vue:61-93

Issue: The component reads cloudStore.connections and cloudStore.loading reactively but never calls cloudStore.fetchConnections() (or equivalent) in an onMounted hook. If a user navigates directly to /cloud without first visiting a page that pre-populates the store, the component renders the "No cloud storage connected" empty state without fetching live data. This is a reliability gap for direct navigation and deep-link scenarios.

Fix: Add onMounted:

import { computed, onMounted } from 'vue'
onMounted(() => { cloudStore.fetchConnections?.() })

or document explicitly that the parent layout is responsible for pre-fetching.


IN-02: classify_document uses mutable default argument body: dict = {}

File: backend/api/documents.py:657

Issue: body: dict = {} is the classic Python mutable-default-argument anti-pattern. FastAPI reconstructs body parameters per request so the classic shared-state bug does not manifest in production, but static analysis tools (ruff B006, mypy) flag it, and calling the function directly from tests with no body argument risks state sharing if the function is ever modified. Use None as the sentinel:

async def classify_document(
    doc_id: str,
    body: Optional[dict] = None,
    
):
    topic_names = body.get("topics") if body else None

Reviewed: 2026-05-30T00:00:00Z Reviewer: Claude (gsd-code-reviewer) Depth: standard