Files
kite/.planning/phases/05-cloud-storage-backends/05-12-PLAN.md
T
curo1305 10175ee4b5 fix(05-12): close 3 UAT gaps — OAuth 400 preflight, 502 cloud fallback, upload hint
- oauth_initiate: pre-flight check returns 400 with env-var hint when
  GOOGLE_CLIENT_ID/SECRET or ONEDRIVE_CLIENT_ID/SECRET are not configured,
  preventing opaque MSAL/OAuth library 500 errors on misconfigured servers
- stream_document_content: broad except-clause catches non-CloudConnectionError
  exceptions and returns 502 with user-friendly message (was raw 500)
- docker-compose.yml: add volumes: - ./backend:/app to celery-worker so code
  changes are picked up by docker compose restart without a rebuild
- CloudStorageView: upload hint paragraph directs users to navigate into a
  cloud folder; no DropZone added (no folder context at overview level)
- 3 new backend tests pass; 2 existing tests patched with credential monkeypatch;
  full suite: 293 passed, 0 new failures, 1 pre-existing (test_extract_docx)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 17:55:08 +02:00

305 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
phase: 05-cloud-storage-backends
plan: 12
type: execute
wave: 1
depends_on: []
files_modified:
- backend/api/cloud.py
- backend/api/documents.py
- docker-compose.yml
- frontend/src/views/CloudStorageView.vue
- backend/tests/test_cloud_api.py
autonomous: true
requirements: [CLOUD-01, CLOUD-02, CLOUD-07]
gap_closure: true
must_haves:
truths:
- "OneDrive OAuth initiate returns HTTP 400 with a descriptive message when ONEDRIVE_CLIENT_ID or ONEDRIVE_CLIENT_SECRET is not configured — not a 500 from MSAL"
- "Google Drive OAuth initiate returns HTTP 400 with a descriptive message when GOOGLE_CLIENT_ID or GOOGLE_CLIENT_SECRET is not configured"
- "stream_document_content returns 502 (not 500) when a cloud backend raises an unexpected exception"
- "celery-worker in docker-compose.yml has a volume mount so code changes are picked up by docker compose restart (no rebuild required)"
- "CloudStorageView shows an upload hint directing users to navigate into a cloud folder to upload files"
artifacts:
- path: "backend/api/cloud.py"
provides: "Pre-flight config check in oauth_initiate for both onedrive and google_drive providers"
- path: "backend/api/documents.py"
provides: "Broad except-clause in stream_document_content catches non-CloudConnectionError exceptions and returns 502"
- path: "docker-compose.yml"
provides: "celery-worker service has volumes: - ./backend:/app matching the backend service"
- path: "frontend/src/views/CloudStorageView.vue"
provides: "Upload hint paragraph shown when connections exist, directing users to navigate into a folder"
key_links:
- from: "frontend Settings → Cloud Storage → Connect OneDrive"
to: "GET /api/cloud/oauth/initiate/onedrive"
via: "Returns 400 with readable error when env vars missing"
- from: "frontend document preview"
to: "GET /api/documents/{id}/content"
via: "Returns 502 instead of 500 on cloud backend failure"
---
<objective>
Close 3 UAT gaps from Phase 5 testing:
1. **OneDrive OAuth 500** (major): When ONEDRIVE_CLIENT_ID/SECRET env vars are not set, MSAL raises an exception that surfaces as a 500 error. Users cannot distinguish misconfiguration from a code bug. Add a pre-flight check that returns 400 with a human-readable message before touching MSAL. Same check for Google Drive.
2. **Cloud document stream opaque 500** (blocker): `stream_document_content` catches `CloudConnectionError` → 503, but any other exception from the cloud backend becomes a raw 500. Add a broad `except Exception` → 502 with a user-friendly message. Also add `volumes: ./backend:/app` to celery-worker in docker-compose.yml so code changes are reflected by `docker compose restart` without a full rebuild.
3. **Upload hint in CloudStorageView** (blocker): The sidebar "Cloud Storage" link now navigates to `/cloud` (CloudStorageView) which shows provider connections but has no DropZone. Users expect to be able to upload there. Adding a DropZone would require knowing which cloud folder to target (not available at this level). Instead, add a clear inline hint: "To upload files, navigate into a cloud folder first."
</objective>
<execution_context>
@$HOME/.claude/get-shit-done/workflows/execute-plan.md
@$HOME/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/05-cloud-storage-backends/05-UAT.md
</context>
<interfaces>
<!-- Key contracts the executor needs. -->
From backend/api/cloud.py — oauth_initiate (lines ~314384):
- Route: GET /api/cloud/oauth/initiate/{provider}
- Provider validation at line ~336: `if provider not in VALID_OAUTH_PROVIDERS: raise HTTPException(400, ...)`
- google_drive block starts at line ~348 with `if provider == "google_drive":`
- onedrive block starts at line ~370 with `elif provider == "onedrive":`
- settings fields: `settings.google_client_id`, `settings.google_client_secret` (config.py line ~62); `settings.onedrive_client_id`, `settings.onedrive_client_secret`, `settings.onedrive_tenant_id` (config.py line ~64)
- All three onedrive fields default to empty string — no startup validation
From backend/api/documents.py — stream_document_content (lines ~708780):
- CloudConnectionError catch at line ~754 returns 503
- No broad except-clause after it — any other cloud exception becomes unhandled 500
- `from storage import get_storage_backend, get_storage_backend_for_document` at line 40
From docker-compose.yml — celery-worker service (lines ~81100):
- Has no `volumes:` block — backend code changes require `docker compose up --build celery-worker`
- `backend` service at line ~53 has `volumes: - ./backend:/app` — same pattern needed for celery-worker
From frontend/src/views/CloudStorageView.vue:
- Content section starts at line ~12: `<div class="flex-1 overflow-y-auto px-6 py-5">`
- Empty state div at line ~23: `<div v-else-if="connections.length === 0" ...>`
- Connections list at line ~30: `<div v-else class="flex flex-col divide-y ...">` (rows listing providers)
- No DropZone imported or rendered anywhere in the component
- Upload hint must appear AFTER the connections list (inside the `v-else` branch) — not in the empty state
From backend/tests/test_cloud_api.py:
- Read the file to understand existing test fixtures before adding new tests
- Add tests for the 400 response when env vars are missing (mock settings)
</interfaces>
<tasks>
<task type="auto" tdd="true">
<name>Task 1: Backend — pre-flight config validation in oauth_initiate</name>
<files>backend/api/cloud.py, backend/tests/test_cloud_api.py</files>
<read_first>
- backend/api/cloud.py (lines 310390: oauth_initiate function)
- backend/config.py (lines 6070: onedrive_client_id, google_client_id fields)
- backend/tests/test_cloud_api.py (full file: existing fixtures and test patterns)
</read_first>
<behavior>
- GET /api/cloud/oauth/initiate/google_drive with google_client_id="" → 400 {"detail": "Google Drive OAuth is not configured on this server. Set GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET in your environment."}
- GET /api/cloud/oauth/initiate/onedrive with onedrive_client_id="" → 400 {"detail": "OneDrive OAuth is not configured on this server. Set ONEDRIVE_CLIENT_ID, ONEDRIVE_CLIENT_SECRET, and ONEDRIVE_TENANT_ID in your environment."}
- GET /api/cloud/oauth/initiate/onedrive with all onedrive env vars set → still attempts MSAL (existing behavior, not broken)
- GET /api/cloud/oauth/initiate/unknown_provider → still 400 "Unsupported OAuth provider" (existing behavior unchanged)
</behavior>
<action>
In backend/api/cloud.py, inside the `oauth_initiate` function, AFTER the existing `VALID_OAUTH_PROVIDERS` check and BEFORE the `if provider == "google_drive":` block, insert two pre-flight checks:
1. For google_drive: immediately before `if provider == "google_drive":`, add:
```
if provider == "google_drive" and (not settings.google_client_id or not settings.google_client_secret):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="Google Drive OAuth is not configured on this server. Set GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET in your environment.",
)
```
2. For onedrive: immediately before `elif provider == "onedrive":`, add:
```
if provider == "onedrive" and (not settings.onedrive_client_id or not settings.onedrive_client_secret):
raise HTTPException(
status_code=status.HTTP_400_BAD_REQUEST,
detail="OneDrive OAuth is not configured on this server. Set ONEDRIVE_CLIENT_ID, ONEDRIVE_CLIENT_SECRET, and ONEDRIVE_TENANT_ID in your environment.",
)
```
These checks must fire before the `if provider == "google_drive":` / `elif provider == "onedrive":` blocks — do NOT restructure the if/elif chain.
In backend/tests/test_cloud_api.py, add two tests using `monkeypatch` (pytest) to override settings fields:
1. `test_oauth_initiate_google_drive_not_configured` — monkeypatch `settings.google_client_id = ""` and `settings.google_client_secret = ""`, call GET /api/cloud/oauth/initiate/google_drive as an authenticated regular user, assert 400, assert "GOOGLE_CLIENT_ID" in detail.
2. `test_oauth_initiate_onedrive_not_configured` — monkeypatch `settings.onedrive_client_id = ""`, call GET /api/cloud/oauth/initiate/onedrive, assert 400, assert "ONEDRIVE_CLIENT_ID" in detail.
Use the existing authenticated client fixture from the test file — read the file to find its name before writing tests.
</action>
<acceptance_criteria>
- backend/api/cloud.py oauth_initiate contains `if provider == "google_drive" and (not settings.google_client_id or not settings.google_client_secret)` before the `if provider == "google_drive":` block
- backend/api/cloud.py oauth_initiate contains `if provider == "onedrive" and (not settings.onedrive_client_id or not settings.onedrive_client_secret)` before the `elif provider == "onedrive":` block
- `pytest backend/tests/test_cloud_api.py::test_oauth_initiate_google_drive_not_configured backend/tests/test_cloud_api.py::test_oauth_initiate_onedrive_not_configured -v` exits 0
- Both tests assert HTTP 400 and the relevant env var name in the detail string
</acceptance_criteria>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_cloud_api.py::test_oauth_initiate_google_drive_not_configured tests/test_cloud_api.py::test_oauth_initiate_onedrive_not_configured -v</automated>
</verify>
<done>Two new tests pass. oauth_initiate returns 400 with descriptive message when provider credentials are empty. Existing tests unchanged.</done>
</task>
<task type="auto">
<name>Task 2: Backend — 502 fallback in stream_document_content + celery-worker volume mount</name>
<files>backend/api/documents.py, docker-compose.yml, backend/tests/test_documents_api.py</files>
<read_first>
- backend/api/documents.py (lines 708780: stream_document_content function)
- docker-compose.yml (lines 53100: backend and celery-worker service definitions)
- backend/tests/test_documents_api.py (read to find fixtures for mocking get_storage_backend_for_document)
</read_first>
<behavior>
- GET /api/documents/{id}/content when cloud backend raises CloudConnectionError → 503 "Cloud connection requires re-authentication" (EXISTING, unchanged)
- GET /api/documents/{id}/content when cloud backend raises any other Exception (e.g., aiohttp.ClientError, timeout, generic RuntimeError) → 502 "Cloud backend unreachable. Please try again or reconnect in Settings."
- GET /api/documents/{id}/content for a MinIO document → 200 with file bytes (unchanged, MinIO errors are not affected by the new clause)
</behavior>
<action>
### 1. backend/api/documents.py — add broad except-clause
In the `stream_document_content` function, find the `except CloudConnectionError as exc:` block (lines ~754758). IMMEDIATELY AFTER its closing line (`from exc`), add a second except clause:
```python
except Exception as exc:
raise HTTPException(
status_code=502,
detail="Cloud backend unreachable. Please try again or reconnect in Settings.",
) from exc
```
The final try/except structure must be:
```
try:
storage_backend = await get_storage_backend_for_document(...)
file_bytes = await storage_backend.get_object(doc.object_key)
except CloudConnectionError as exc:
raise HTTPException(503, ...) from exc
except Exception as exc:
raise HTTPException(502, ...) from exc
```
Do NOT catch Exception before CloudConnectionError — order matters (specific before broad).
### 2. docker-compose.yml — add volume mount to celery-worker
In the `celery-worker` service block, add a `volumes:` key with the same bind mount as the `backend` service:
```yaml
volumes:
- ./backend:/app
```
Place it after `environment:` and before `extra_hosts:` (or after `extra_hosts:` if that reads more cleanly). Match the indentation of surrounding keys (2 spaces).
Also add `PYTHONDONTWRITEBYTECODE=1` to the celery-worker environment if it is not already there (prevents .pyc files from cluttering the bind-mounted source).
### 3. backend/tests/test_documents_api.py — add test for 502 path
Add `test_stream_document_content_cloud_backend_error`:
- Create a document with `storage_backend = "google_drive"` (or any non-minio value)
- Mock `get_storage_backend_for_document` to raise `RuntimeError("connection timeout")`
- Call GET /api/documents/{doc.id}/content as the document owner
- Assert 502 and "Cloud backend unreachable" in the response detail
Read existing document stream tests to find the right fixture pattern before writing.
</action>
<acceptance_criteria>
- backend/api/documents.py stream_document_content has `except Exception as exc:` AFTER `except CloudConnectionError as exc:` block, raising HTTPException(502)
- docker-compose.yml celery-worker service has `volumes: - ./backend:/app`
- `pytest backend/tests/test_documents_api.py::test_stream_document_content_cloud_backend_error -v` exits 0
- `pytest backend/tests/ -v -k "stream_document_content"` — all stream tests pass (no regression)
</acceptance_criteria>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_documents_api.py -v -k "stream" 2>&1 | tail -20</automated>
</verify>
<done>502 except-clause added. Volume mount added to docker-compose.yml. New test passes. Existing stream tests pass.</done>
</task>
<task type="auto">
<name>Task 3: Frontend — upload hint in CloudStorageView</name>
<files>frontend/src/views/CloudStorageView.vue</files>
<read_first>
- frontend/src/views/CloudStorageView.vue (full file — understand existing template structure)
- frontend/src/views/CloudFolderView.vue (for reference — how upload works inside a folder)
</read_first>
<behavior>
- When at /cloud with at least one active connection, a hint paragraph is visible below the connections list: "To upload files, navigate into a cloud folder first."
- The hint does not appear on the empty state (no connections) — that state already directs to Settings.
- Clicking a connection row still navigates to /cloud/{provider}/root (existing behavior unchanged).
- No DropZone component is added to this view (no cloud folder context is available at this level).
</behavior>
<action>
In frontend/src/views/CloudStorageView.vue, inside the `v-else` block (the div that renders the connections list, starting at `<div v-else class="flex flex-col divide-y ...`), add a hint paragraph immediately AFTER the closing `</div>` of the connections list (after the `</div>` that closes `v-for`), still inside the `v-else` wrapper:
```html
<p class="mt-4 text-xs text-gray-400 text-center">
To upload files, navigate into a cloud folder first.
</p>
```
The hint must be a sibling of the connections list `<div>`, not nested inside it. Keep both inside the single `v-else` block so the hint is only visible when connections exist.
</action>
<acceptance_criteria>
- CloudStorageView.vue contains `To upload files, navigate into a cloud folder first.` in a `<p>` element inside the `v-else` block
- The `<p>` is NOT inside the `v-else-if="connections.length === 0"` empty state block
- `cd frontend && npm run build` exits 0 with no errors
- No DropZone or UploadProgress imported or rendered in CloudStorageView.vue
</acceptance_criteria>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/frontend && npm run build 2>&1 | tail -5</automated>
</verify>
<done>Upload hint added below connections list. Build passes. No DropZone added. Existing connection click behavior unchanged.</done>
</task>
</tasks>
<threat_model>
## Trust Boundaries
| Boundary | Description |
|----------|-------------|
| oauth_initiate preflight | User-supplied provider string already validated; new checks only inspect server-side settings values — no user input involved |
| 502 error message | Static string, no user data reflected in the error detail |
| volume mount | Read-only to container — same bind mount pattern as backend service |
## STRIDE Threat Register
| Threat ID | Category | Component | Disposition | Mitigation Plan |
|-----------|----------|-----------|-------------|-----------------|
| T-05-12-01 | Information Disclosure | 400 error message for missing creds | mitigate | Message names env vars (server config), not any user data or secret values — safe to expose |
| T-05-12-02 | Information Disclosure | 502 error message | mitigate | Static string "Cloud backend unreachable" — no stack trace, no exception detail leaked to client |
| T-05-12-03 | Tampering | celery-worker volume mount | accept | Bind mount is same as backend service; only developer-controlled source files are mounted; production deployments use image builds, not bind mounts |
| T-05-12-SC | Tampering | npm/pip installs | mitigate | No new packages installed in this plan |
</threat_model>
<verification>
After all tasks complete:
- `cd backend && python -m pytest tests/test_cloud_api.py::test_oauth_initiate_google_drive_not_configured tests/test_cloud_api.py::test_oauth_initiate_onedrive_not_configured -v` — 2 tests pass
- `cd backend && python -m pytest tests/test_documents_api.py::test_stream_document_content_cloud_backend_error -v` — 1 test passes
- `cd backend && python -m pytest -v` — zero new failures
- `cd frontend && npm run build` — zero errors
- Manual: docker-compose.yml celery-worker service has `volumes: - ./backend:/app`
- Manual: open CloudStorageView at /cloud — upload hint visible below connections list
- Manual: curl -H "Authorization: Bearer <token>" http://localhost:8000/api/cloud/oauth/initiate/onedrive (with empty OneDrive creds) → 400 with "ONEDRIVE_CLIENT_ID" in detail
</verification>
<success_criteria>
- oauth_initiate returns 400 with descriptive env-var hint for unconfigured google_drive and onedrive
- stream_document_content returns 502 (not 500) for non-CloudConnectionError cloud exceptions
- celery-worker has volume mount so `docker compose restart celery-worker` picks up code changes
- CloudStorageView shows upload hint directing users to navigate into a folder
- 3 new backend tests pass; full pytest suite has zero new failures; frontend build clean
</success_criteria>
<output>
Create `.planning/phases/05-cloud-storage-backends/05-12-SUMMARY.md` when done
</output>