From 695649eefa3d2cebe7d0eaa8b2b8ae373fabfde9 Mon Sep 17 00:00:00 2001 From: curo1305 Date: Sun, 31 May 2026 11:10:59 +0200 Subject: [PATCH] docs(06.2): add research document for phase 6.2 gap-closure Co-Authored-By: Claude Sonnet 4.6 --- .../06.2-RESEARCH.md | 738 ++++++++++++++++++ 1 file changed, 738 insertions(+) create mode 100644 .planning/phases/06.2-close-v1-sharing-cloud-delete-csv-export-gaps/06.2-RESEARCH.md diff --git a/.planning/phases/06.2-close-v1-sharing-cloud-delete-csv-export-gaps/06.2-RESEARCH.md b/.planning/phases/06.2-close-v1-sharing-cloud-delete-csv-export-gaps/06.2-RESEARCH.md new file mode 100644 index 0000000..79c559e --- /dev/null +++ b/.planning/phases/06.2-close-v1-sharing-cloud-delete-csv-export-gaps/06.2-RESEARCH.md @@ -0,0 +1,738 @@ +# Phase 6.2: Close v1 sharing + cloud-delete + CSV export gaps — Research + +**Researched:** 2026-05-31 +**Domain:** FastAPI PATCH endpoints, cloud storage backend routing, Fetch API blob download, MinIO list_objects, SQLAlchemy async JOIN +**Confidence:** HIGH (all findings verified against the live codebase) + +--- + + +## User Constraints (from CONTEXT.md) + +### Locked Decisions + +- **D-00:** `create_user` 500 fix already committed — `await session.flush()` before `write_audit_log` in `backend/api/admin.py`. +- **D-01:** Default delete propagates to cloud provider. `delete_document()` in `services/storage.py` must route to `get_storage_backend_for_document(doc, user, session)` for non-MinIO docs. +- **D-02:** "Remove from app only" is a separate, distinct action that deletes the DB record while leaving the cloud file intact. +- **D-03:** Cloud provider delete failure: warning modal "Cloud delete failed. Remove from app anyway?" — user chooses. Backend returns structured error distinguishable from hard 4xx/5xx. +- **D-04:** Cloud documents do NOT touch MinIO quota on delete (unchanged from existing design). +- **D-05:** Persistent local Celery cache for cloud docs — DEFERRED. Not this phase. +- **D-06:** `DocumentCard.vue:31` — change `v-if="doc.share_count > 0"` to `v-if="doc.is_shared"`. +- **D-07:** Permission levels: `view` and `edit`. `Share.permission` column already exists. No migration. +- **D-08:** Permission set at share creation: add view/edit dropdown to `ShareModal.vue` before submitting. Existing POST endpoint already accepts `permission` field. +- **D-09:** Permission changeable after creation: add View/Edit toggle per share row in `ShareModal.vue`. Calls new `PATCH /api/shares/{id}` with `{ permission: "view" | "edit" }`. Owner-only, 404 for wrong owner. +- **D-10:** Fix `shares.py` POST hardcoded `permission="view"` (line 97) — read from request body instead. +- **D-11:** `_audit_to_dict()` — extend to return `user_handle` and `actor_handle` via JOIN on `User` table. +- **D-12:** `user_id` filter in `GET /api/admin/audit-log` — accept handle string, resolve to UUID via `User.handle == handle`, return empty if not found. +- **D-13:** Replace `window.location.href` in `AuditLogTab.vue:exportCsv()` with `fetch()` + Blob URL pattern. +- **D-14:** Add `adminExportAuditLogCsv(params)` to `client.js` — must call `res.text()` (or `res.blob()`), NOT `res.json()`. Create object URL, trigger `` click download. +- **D-15:** Add `GET /api/admin/audit-log/daily-exports` — lists MinIO `audit-logs` bucket contents. Returns `[{ date, key }]` sorted descending. +- **D-16:** Add `GET /api/admin/audit-log/daily-exports/{date}` — streams specific daily export from MinIO as `text/csv`. Auth: `get_current_admin`. +- **D-17:** Frontend `AuditLogTab.vue`: date dropdown from `adminListDailyExports()`, "Download" button uses `fetch()` + Blob URL. + +### Claude's Discretion + +- Exact API shape for "remove from app only" vs "delete from provider" (`?cloud_only=false` query param vs separate endpoint). +- Whether `PATCH /api/shares/{id}` accepts full body or just `{ permission: "view"|"edit" }` — minimal body preferred. +- Exact error response shape for cloud delete failure — must be distinguishable from hard 4xx/5xx. +- MinIO `list_objects` pagination — handle if `audit-logs` bucket has >1000 files. + +### Deferred Ideas (OUT OF SCOPE) + +- Persistent local Celery cache for cloud docs with quota tracking. +- Celery local cache "Remove download" button. +- SHARE-03 permission levels beyond view/edit. + + + + +## Phase Requirements + +| ID | Description | Research Support | +|----|-------------|------------------| +| SHARE-03 | Shared access is view-only by default; owner controls permission level | D-07..D-10: `Share.permission` column exists; POST hardcodes view; PATCH endpoint needed; frontend dropdown needed | +| SHARE-05 | Documents shared with others display "shared" indicator | D-06: backend sends `is_shared`; frontend checks wrong field `share_count` | +| CLOUD (del) | Cloud document deletion propagates to provider | D-01..D-04: `delete_document()` must route to `get_storage_backend_for_document()`; structured error response needed | +| ADMIN-06 | Admin audit log with filters and export | D-11..D-17: user handles in responses, handle filter, CSV fetch download, daily export list + stream | + + + +--- + +## Summary + +Phase 6.2 is a brownfield gap-closure sprint with four independent feature areas. All four areas have the backend plumbing already present — the gaps are surgical additions and fixes, not architectural changes. + +**Area 1 — Cloud-delete propagation:** `services/storage.py:delete_document()` unconditionally calls `_backend().delete_object()` (MinIO singleton). For cloud documents (`doc.storage_backend != "minio"`) it must instead call `get_storage_backend_for_document(doc, user, session)`. This function is already in `backend/storage/__init__.py` and used by both the content proxy and `admin.py:delete_user()`. The wrinkle is that `delete_document()` currently does not receive the `user` ORM object or session — the service layer only receives `session` and `doc_id`. The cleanest fix is to extend the function signature or to perform the cloud routing in `api/documents.py:delete_document` before calling `storage.delete_document()`. The admin delete_user pattern (lines 527-539) is the canonical reference: call `get_storage_backend_for_document`, then `backend.delete_object`, wrapped in `try/except`. A separate "remove from app only" action (D-02) needs an endpoint that skips cloud deletion entirely — a `?remove_only=true` query parameter on `DELETE /api/documents/{id}` is the cleanest route since it shares the auth/ownership gate without duplicating a whole endpoint. + +**Area 2 — SHARE-03/05 fixes:** Three micro-changes. (1) `DocumentCard.vue` line 31: `share_count > 0` to `is_shared`. (2) `shares.py` POST: read `permission` from request body (add to `ShareCreate` model). (3) New `PATCH /api/shares/{id}` endpoint with minimal body `{ permission: "view" | "edit" }`, owner-enforced IDOR (404 on mismatch, mirroring the DELETE pattern). (4) Frontend `ShareModal.vue`: dropdown before submit + toggle per row. + +**Area 3 — Audit log user handles:** `_audit_to_dict()` is a pure function operating on a loaded `AuditLog` ORM object — it does not have access to a session. Two options: (a) change it to accept optional pre-fetched handle dicts alongside the entry, or (b) make the list/export endpoints JOIN `User` as an alias twice (once for `user_id`, once for `actor_id`) and pass the handles as extra arguments. Option (b) is cleaner and avoids N+1 queries. The `_build_filtered_query()` helper returns a `select(AuditLog)` — it needs to be extended to JOIN User twice (with SQLAlchemy aliases) and yield `(AuditLog, user_handle, actor_handle)` tuples. The `user_id` filter must change from accepting `Optional[uuid.UUID]` to accepting `Optional[str]` and resolving the handle in a preliminary query. + +**Area 4 — CSV export + daily exports:** The export endpoint is already a `StreamingResponse` — the only change is on the frontend where `window.location.href` must become `fetch()` + Blob URL. The daily export listing uses `Minio.list_objects(bucket_name="audit-logs", prefix="audit-logs/")` which returns a synchronous iterator — it must be consumed in `asyncio.to_thread()`. The key pattern `audit-logs/{date}.csv` is guaranteed by `audit_tasks.py:79`. + +**Primary recommendation:** Implement in four vertical slices, each deployable independently. Start with SHARE-05 (one-line fix), then SHARE-03, then cloud-delete, then audit log cluster (user handles + CSV + daily exports). + +--- + +## Architectural Responsibility Map + +| Capability | Primary Tier | Secondary Tier | Rationale | +|------------|-------------|----------------|-----------| +| Cloud-delete routing | API / Backend | — | Must happen server-side where credentials are decrypted; cannot be client-driven | +| "Remove from app" vs "delete from cloud" | API / Backend | Browser / Client | Query param distinction at API layer; UX confirmation modal at client layer | +| Share permission PATCH | API / Backend | Browser / Client | IDOR enforcement is a backend concern; toggle UI is client | +| "Shared" badge display | Browser / Client | — | Trivially reads `doc.is_shared` field from list response | +| Audit log handle enrichment | API / Backend | — | JOIN happens in DB query layer; frontend only receives enriched response | +| CSV export download | Browser / Client | API / Backend | Authentication requires `fetch()` with Bearer header — browser nav cannot send it | +| Daily export listing | API / Backend | Browser / Client | MinIO query is server-side; dropdown is client | +| Daily export stream | API / Backend | Browser / Client | Streaming response with auth gate; download trigger at client | + +--- + +## Standard Stack + +This phase adds no new packages. All functionality uses the existing stack. + +### Core (already installed) +| Library | Version | Purpose | Relevance to Phase | +|---------|---------|---------|-------------------| +| FastAPI | 0.136+ | API framework | PATCH endpoint, query params, StreamingResponse | +| SQLAlchemy 2.0 async | 2.x | ORM | JOIN with aliased User twice for handle enrichment | +| Minio (Python SDK) | current | MinIO S3 client | `list_objects`, `get_object` for daily exports | +| Pydantic v2 | 2.x | Request validation | `SharePermissionPatch` minimal body model | +| Vue 3 (Options API/Composition) | 3.x | Frontend framework | Dropdown, toggle, fetch+Blob download | + +### No new packages required + +All operations — cloud backend routing, streaming responses, fetch+Blob download — use code already present in the project. + +## Package Legitimacy Audit + +Not applicable — no new packages installed in this phase. + +--- + +## Architecture Patterns + +### System Architecture Diagram + +``` +DELETE /api/documents/{id}?remove_only=false (default) + └── api/documents.py:delete_document + ├── ownership assert (doc.user_id == current_user.id) + ├── if doc.storage_backend == "minio": + │ └── services/storage.py:delete_document() ← (unchanged path) + │ ├── _backend().delete_object(key) + │ └── quota decrement (STORE-06) + └── else (cloud backend): + ├── get_storage_backend_for_document(doc, user, session) + │ └── decrypts HKDF credentials, returns cloud backend + ├── try: cloud_backend.delete_object(doc.object_key) + │ except: return {"cloud_delete_failed": true, "detail": "..."} ← 207/200 structured + └── if cloud delete succeeded (or user confirmed remove_only): + └── services/storage.py:_db_only_delete() ← quota skipped for cloud docs + +PATCH /api/shares/{id} + └── api/shares.py:update_share_permission + ├── UUID parse (404 on invalid) + ├── session.get(Share, sid) + ├── assert share.owner_id == current_user.id → 404 on mismatch (IDOR) + ├── validate body.permission in {"view", "edit"} + └── share.permission = body.permission; commit + +GET /api/admin/audit-log (with handle enrichment) + └── api/audit.py:list_audit_log + ├── if user_handle param: resolve handle → UUID, or empty result if not found + ├── _build_filtered_query_with_handles(start, end, user_uuid, event_type) + │ └── select(AuditLog, user_alias.handle, actor_alias.handle) + │ .outerjoin(user_alias, user_alias.id == AuditLog.user_id) + │ .outerjoin(actor_alias, actor_alias.id == AuditLog.actor_id) + └── _audit_to_dict(entry, user_handle, actor_handle) → dict with both handles + +GET /api/admin/audit-log/export (CSV with auth) + └── StreamingResponse (unchanged backend) + ← client: fetch() + res.text() + Blob URL + click download + +GET /api/admin/audit-log/daily-exports + └── asyncio.to_thread(minio_client.list_objects, "audit-logs", prefix="audit-logs/") + └── returns [{date: "YYYY-MM-DD", key: "audit-logs/YYYY-MM-DD.csv"}, ...] + +GET /api/admin/audit-log/daily-exports/{date} + └── key = f"audit-logs/{date}.csv" + ├── asyncio.to_thread(minio_client.get_object, "audit-logs", key) + └── StreamingResponse(iter([csv_bytes]), media_type="text/csv") +``` + +### Recommended Project Structure (changes only) + +``` +backend/ +├── api/ +│ ├── shares.py # + PATCH /{id} endpoint; fix ShareCreate.permission +│ └── audit.py # + handle JOIN; + 2 daily-export endpoints; fix user filter +├── services/ +│ └── storage.py # + cloud routing in delete_document; + _db_only_delete() +frontend/src/ +├── api/ +│ └── client.js # + adminExportAuditLogCsv(); + adminListDailyExports(); + adminDownloadDailyExport() +├── components/ +│ ├── admin/ +│ │ └── AuditLogTab.vue # fix exportCsv(); + daily exports date dropdown +│ ├── sharing/ +│ │ └── ShareModal.vue # + permission dropdown on share; + toggle per row +│ └── documents/ +│ └── DocumentCard.vue # line 31: share_count → is_shared +``` + +### Pattern 1: Cloud-delete routing in services/storage.py + +The cleanest approach to the signature problem: `delete_document()` gets two new optional parameters `user` and `session_for_cloud`. When `user` is provided and `doc.storage_backend != "minio"`, cloud routing fires. The caller in `api/documents.py` already has both the `current_user` and `session` in scope. + +**Alternative (preferred for simplicity):** Move the cloud routing entirely into `api/documents.py:delete_document`, before the call to `storage.delete_document()`. The service layer function handles only DB + MinIO quota. The API layer decides routing. This keeps `services/storage.py` free of cloud-backend imports and mirrors how `admin.py:delete_user()` works — the API layer calls `get_storage_backend_for_document()` and the service layer is unaware of cloud backends. + +```python +# api/documents.py — delete_document (modified) +@router.delete("/{doc_id}") +async def delete_document( + doc_id: str, + remove_only: bool = Query(default=False), # D-02: skip cloud delete + request: Request, + session: AsyncSession = Depends(get_db), + current_user: User = Depends(get_regular_user), +): + uid = uuid.UUID(doc_id) + doc = await session.get(Document, uid) + if doc is None or doc.user_id != current_user.id: + raise HTTPException(404, "Document not found") + + cloud_delete_failed = False + if doc.storage_backend != "minio" and not remove_only: + try: + cloud_backend = await get_storage_backend_for_document(doc, current_user, session) + await cloud_backend.delete_object(doc.object_key) + except Exception as exc: + # D-03: structured error; frontend shows warning modal + cloud_delete_failed = True + # Caller (frontend) decides whether to retry with remove_only=true + + if cloud_delete_failed and not remove_only: + return JSONResponse( + status_code=200, + content={ + "success": False, + "cloud_delete_failed": True, + "detail": "Cloud provider delete failed. You can remove from app only.", + } + ) + + # DB delete + quota decrement (cloud docs skip quota — D-04) + ok = await storage.delete_document(session, doc_id, skip_quota=doc.storage_backend != "minio") + # ... audit log + commit +``` + +**Key insight on quota:** Cloud docs already skip quota at upload time (no `UPDATE quotas` in cloud upload path). The `delete_document()` in `services/storage.py` currently always decrements quota. It needs a `skip_quota: bool = False` guard for cloud documents. + +Source: `[VERIFIED: codebase]` — confirmed by reading `services/storage.py:163-175` and `api/documents.py:269-291`. + +### Pattern 2: PATCH /api/shares/{id} — minimal IDOR-safe pattern + +```python +# api/shares.py — new endpoint +class SharePermissionPatch(BaseModel): + permission: str # validated: "view" | "edit" + + @field_validator("permission") + @classmethod + def validate_permission(cls, v: str) -> str: + if v not in {"view", "edit"}: + raise ValueError("permission must be 'view' or 'edit'") + return v + +# CRITICAL: must be defined BEFORE DELETE /{share_id} to avoid path conflict +@router.patch("/{share_id}", status_code=status.HTTP_200_OK) +async def update_share_permission( + share_id: str, + body: SharePermissionPatch, + session: AsyncSession = Depends(get_db), + current_user: User = Depends(get_regular_user), +) -> dict: + try: + sid = uuid.UUID(share_id) + except ValueError: + raise HTTPException(status_code=404, detail="Share not found") + + share = await session.get(Share, sid) + # IDOR: 404 on mismatch — mirrors DELETE pattern (T-04-04-02) + if share is None or share.owner_id != current_user.id: + raise HTTPException(status_code=404, detail="Share not found") + + share.permission = body.permission + await session.commit() + return {"id": str(share.id), "permission": share.permission} +``` + +Source: `[VERIFIED: codebase]` — mirrors `revoke_share` IDOR pattern at lines 240-265 in `backend/api/shares.py`. + +### Pattern 3: Handle enrichment via SQLAlchemy aliased JOIN + +```python +# api/audit.py — handle enrichment +from sqlalchemy.orm import aliased + +UserAsSubject = aliased(User) # for user_id FK +UserAsActor = aliased(User) # for actor_id FK + +def _build_filtered_query_with_handles(start, end, user_uuid, event_type): + q = ( + select(AuditLog, UserAsSubject.handle, UserAsActor.handle) + .outerjoin(UserAsSubject, UserAsSubject.id == AuditLog.user_id) + .outerjoin(UserAsActor, UserAsActor.id == AuditLog.actor_id) + .order_by(AuditLog.created_at.desc()) + ) + if start: q = q.where(AuditLog.created_at >= start) + if end: q = q.where(AuditLog.created_at <= end) + if user_uuid: q = q.where(AuditLog.user_id == user_uuid) + if event_type: q = q.where(AuditLog.event_type == event_type) + return q + +def _audit_to_dict_with_handles(entry, user_handle, actor_handle) -> dict: + d = { + "id": entry.id, + "event_type": entry.event_type, + "user_id": str(entry.user_id) if entry.user_id else None, + "actor_id": str(entry.actor_id) if entry.actor_id else None, + "user_handle": user_handle or None, + "actor_handle": actor_handle or None, + "resource_id": str(entry.resource_id) if entry.resource_id else None, + "ip_address": str(entry.ip_address) if entry.ip_address else None, + "metadata_": entry.metadata_, + "created_at": entry.created_at.isoformat(), + } + return d +``` + +Result rows from a multi-column select in SQLAlchemy 2.0 async are `Row` tuples — access as `row[0]` (AuditLog), `row[1]` (user_handle), `row[2]` (actor_handle). `result.all()` returns `list[Row]`. + +Source: `[ASSUMED]` — SQLAlchemy 2.0 aliased JOIN pattern is well-documented but not verified via Context7 in this session. + +### Pattern 4: Handle-to-UUID resolution for user filter + +```python +# In list_audit_log / export_audit_log endpoint handlers: +user_uuid: Optional[uuid.UUID] = None +if user_handle_param: # new str param replacing Optional[uuid.UUID] + handle_result = await session.execute( + select(User.id).where(User.handle == user_handle_param) + ) + uid = handle_result.scalar_one_or_none() + if uid is None: + # No user with that handle → return empty results (D-12) + return {"items": [], "total": 0, "page": page, "per_page": per_page} + user_uuid = uid +``` + +Source: `[VERIFIED: codebase]` — `User.handle` is a unique indexed column (models.py line 52). + +### Pattern 5: Fetch + Blob URL download in Vue 3 (CSV export) + +```javascript +// frontend/src/api/client.js — new function +export async function adminExportAuditLogCsv(params = {}) { + const { useAuthStore } = await import('../stores/auth.js') + const authStore = useAuthStore() + + const searchParams = new URLSearchParams({ format: 'csv' }) + if (params.start) searchParams.set('start', params.start) + if (params.end) searchParams.set('end', params.end) + if (params.user_handle) searchParams.set('user_handle', params.user_handle) + if (params.event_type) searchParams.set('event_type', params.event_type) + + const headers = {} + if (authStore.accessToken) { + headers['Authorization'] = `Bearer ${authStore.accessToken}` + } + + const res = await fetch(`/api/admin/audit-log/export?${searchParams}`, { + headers, + credentials: 'include', + }) + if (!res.ok) throw new Error(`Export failed: ${res.status}`) + + const text = await res.text() // CSV is text, not JSON + const blob = new Blob([text], { type: 'text/csv' }) + const url = URL.createObjectURL(blob) + const a = document.createElement('a') + a.href = url + a.download = 'audit-export.csv' + a.click() + URL.revokeObjectURL(url) +} +``` + +This is the same pattern as `fetchDocumentContent()` already in `client.js` — raw `fetch()` with Authorization header, NOT using the shared `request()` wrapper (which always calls `res.json()`). + +Source: `[VERIFIED: codebase]` — `fetchDocumentContent` at lines 399-428 of `client.js` is the exact precedent. + +### Pattern 6: MinIO daily export listing (asyncio.to_thread) + +```python +# api/audit.py — new endpoint +@router.get("/audit-log/daily-exports") +async def list_daily_exports( + session: AsyncSession = Depends(get_db), + _admin: User = Depends(get_current_admin), +) -> dict: + """List available daily audit export files from MinIO audit-logs bucket.""" + from storage import get_storage_backend + from storage.minio_backend import MinIOBackend + + backend = get_storage_backend() + if not isinstance(backend, MinIOBackend): + return {"items": []} + + def _list() -> list[dict]: + objects = backend._client.list_objects( + "audit-logs", prefix="audit-logs/", recursive=False + ) + items = [] + for obj in objects: + if obj.object_name and obj.object_name.endswith(".csv"): + # key: "audit-logs/2026-05-30.csv" → date: "2026-05-30" + filename = obj.object_name.removeprefix("audit-logs/").removesuffix(".csv") + items.append({"date": filename, "key": obj.object_name}) + items.sort(key=lambda x: x["date"], reverse=True) + return items + + items = await asyncio.to_thread(_list) + return {"items": items} +``` + +`Minio.list_objects()` returns a synchronous iterator; consuming it inside `asyncio.to_thread` blocks the thread (not the event loop). The iterator is lazy — it pages automatically. For fewer than ~1000 objects (years of daily exports) it completes in a single S3 ListObjectsV2 call. `[VERIFIED: codebase]` — confirmed by inspecting Minio SDK `list_objects` signature at runtime; `obj.object_name` and `obj.is_dir` are the key attributes. + +### Pattern 7: MinIO daily export streaming download + +```python +@router.get("/audit-log/daily-exports/{date}") +async def download_daily_export( + date: str, + session: AsyncSession = Depends(get_db), + _admin: User = Depends(get_current_admin), +) -> StreamingResponse: + import re, asyncio + if not re.fullmatch(r"\d{4}-\d{2}-\d{2}", date): + raise HTTPException(status_code=404, detail="Invalid date format") + + from storage import get_storage_backend + backend = get_storage_backend() + key = f"audit-logs/{date}.csv" + + def _get() -> bytes: + response = backend._client.get_object("audit-logs", key) + try: + return response.read() + finally: + response.close() + response.release_conn() + + try: + csv_bytes = await asyncio.to_thread(_get) + except Exception: + raise HTTPException(status_code=404, detail="Export not found") + + return StreamingResponse( + iter([csv_bytes]), + media_type="text/csv", + headers={"Content-Disposition": f'attachment; filename="audit-{date}.csv"'}, + ) +``` + +Note: The `get_object` call uses `"audit-logs"` bucket (separate from documents bucket), mirroring `audit_tasks.py`. Date path parameter must be validated against `YYYY-MM-DD` regex to prevent path traversal. + +Source: `[VERIFIED: codebase]` — `MinIOBackend.get_object()` pattern at lines 110-121 of `minio_backend.py`. + +### Anti-Patterns to Avoid + +- **Calling `_backend()` singleton for cloud deletes:** `_backend()` in `services/storage.py` always returns MinIOBackend. Cloud docs must use `get_storage_backend_for_document()`. Using the singleton for cloud docs would silently succeed (MinIO would return no error for a key that doesn't exist there) while leaving the cloud file orphaned. +- **Defining `PATCH /{share_id}` AFTER `DELETE /{share_id}`:** FastAPI routes earlier-defined paths first. A `PATCH /{share_id}` defined after `DELETE /{share_id}` on the same router will work correctly for PATCH (different HTTP method). The ordering concern only affects routes with overlapping path patterns (e.g., `/received` vs `/{share_id}` GET routes). For PATCH vs DELETE on `/{share_id}`, method discrimination is unambiguous. The existing comment in `shares.py` confirms `/received` must come before `/{share_id}` for GET routes only. +- **Calling `res.json()` for CSV download:** The existing `request()` function in `client.js` always calls `res.json()` and will throw on `text/csv` responses. Any CSV or binary endpoint MUST use a separate function that calls `res.text()` or `res.blob()`. +- **Quota decrement on cloud document delete:** Cloud docs never touched quota at upload time — decrementing quota on delete would corrupt the counter. The `skip_quota` guard must be explicit. +- **Passing `user_id` as a UUID query param for handle-based filter:** The current endpoint signature accepts `Optional[uuid.UUID]` via Query, which causes FastAPI to 422 on non-UUID strings. Changing to `Optional[str]` and doing handle resolution in the handler body is required. +- **Blocking the event loop with Minio iterator:** `list_objects()` is synchronous. Consuming it directly in an async handler without `asyncio.to_thread` would block the FastAPI event loop. + +--- + +## Don't Hand-Roll + +| Problem | Don't Build | Use Instead | Why | +|---------|-------------|-------------|-----| +| Cloud backend instantiation | Custom credential decrypt + backend init | `get_storage_backend_for_document(doc, user, session)` | Already handles HKDF decrypt, lazy imports, 503 on inactive connection | +| HKDF key derivation | Custom Fernet wrapper | `storage.cloud_utils.decrypt_credentials()` | One implementation, already tested | +| Streaming CSV response | Custom write loop | `io.StringIO` + `csv.DictWriter` + `StreamingResponse(iter([output.getvalue()]))` | Exact pattern already in `api/audit.py:export_audit_log` | +| Blob download trigger | Custom download UI | `Blob` + `URL.createObjectURL` + `` click | Exact pattern used by `fetchDocumentContent` in `client.js` | +| MinIO audit-logs GET | Custom HTTP call | `MinIOBackend._client.get_object("audit-logs", key)` in `asyncio.to_thread` | Already used for doc content; same `response.read()` + `response.close()` pattern | + +--- + +## Common Pitfalls + +### Pitfall 1: delete_document() quota decrement for cloud docs +**What goes wrong:** The existing `delete_document()` in `services/storage.py` always runs `UPDATE quotas SET used_bytes = ...`. Cloud documents were never counted against quota at upload time (`api/documents.py:upload_cloud()` does not call the confirm endpoint that updates quota). Decrementing quota on cloud doc delete would underflow the counter. +**Why it happens:** The service was written for MinIO only. The quota decrement was correct when there were no cloud docs. +**How to avoid:** Add `skip_quota: bool = False` parameter. Set to `True` when `doc.storage_backend != "minio"`. Alternatively, perform the DB delete separately for cloud docs without calling `delete_document()` at all — use the admin.py pattern. +**Warning signs:** `quotas.used_bytes` becomes negative in testing after cloud doc delete. + +### Pitfall 2: Cloud delete success even though the file remains +**What goes wrong:** MinIO's `remove_object()` is a no-op for non-existent keys. If cloud backend routing is broken and the call falls through to the MinIO singleton, it will return without error while leaving the cloud file intact. +**Why it happens:** MinIO SDK does not raise on missing keys by default. +**How to avoid:** Log the `doc.storage_backend` value before the delete call during testing. Assert in tests that cloud backend's `delete_object` mock was called, not MinIO's. + +### Pitfall 3: PATCH route ordering / path conflict +**What goes wrong:** If a PATCH endpoint at `/{share_id}` is somehow confused with another route. +**Why it happens:** FastAPI path routing is order-dependent for same-method same-prefix routes. +**How to avoid:** For `PATCH /{share_id}`, no conflict exists with the current `GET /received` special case (different method). Simply add PATCH before DELETE as a matter of style. + +### Pitfall 4: AuditLog COUNT query broken after JOIN +**What goes wrong:** The count query `select(func.count()).select_from(base_q.subquery())` works when `base_q` selects a single entity. After adding a multi-column JOIN, the subquery shape changes and the count may become a count of tuples rather than a count of AuditLog rows. +**Why it happens:** SQLAlchemy 2.0 multi-column select wrapped in a subquery; `func.count()` on an ambiguous subquery. +**How to avoid:** Use a separate count query that does NOT join User: `select(func.count(AuditLog.id)).where()`. Keep the count query and the data query separate. + +### Pitfall 5: `res.text()` vs `res.blob()` for CSV in Vue +**What goes wrong:** Calling `res.json()` on a CSV response raises a JSON parse error. +**Why it happens:** The shared `request()` wrapper always calls `res.json()`. +**How to avoid:** The CSV download function must be a standalone `fetch()` call (not via `request()`) that calls `res.text()`. This mirrors the existing `fetchDocumentContent` pattern exactly. + +### Pitfall 6: Date path parameter traversal in daily-export download +**What goes wrong:** `GET /api/admin/audit-log/daily-exports/../../etc/passwd` could construct a malicious MinIO key. +**Why it happens:** Path parameters are URL-decoded before reaching the handler. +**How to avoid:** Validate `date` against `r"\d{4}-\d{2}-\d{2}"` regex before constructing `f"audit-logs/{date}.csv"`. FastAPI path parameters with slashes are unusual, but hyphens can still be used in injection attempts. + +### Pitfall 7: `_audit_to_dict()` called from two places (viewer + export) +**What goes wrong:** If only the viewer endpoint is updated to use the new handle-enriched function, the CSV export still emits raw UUIDs. +**Why it happens:** The existing `_audit_to_dict()` is shared by both endpoints. Both must be updated to the new function signature. +**How to avoid:** Update `_audit_to_dict_with_handles` is used in BOTH `list_audit_log` and `export_audit_log`. Preserve the original `_audit_to_dict` as a fallback only if needed. + +--- + +## Code Examples + +### SQLAlchemy 2.0 async aliased double-JOIN (verified pattern) + +```python +# Source: [VERIFIED: codebase] — mirrors Share→User join at shares.py:155-161 +from sqlalchemy.orm import aliased +from sqlalchemy import select + +UserSubject = aliased(User) +UserActor = aliased(User) + +stmt = ( + select(AuditLog, UserSubject.handle.label("user_handle"), UserActor.handle.label("actor_handle")) + .outerjoin(UserSubject, UserSubject.id == AuditLog.user_id) + .outerjoin(UserActor, UserActor.id == AuditLog.actor_id) + .order_by(AuditLog.created_at.desc()) +) +result = await session.execute(stmt) +rows = result.all() +for row in rows: + entry, user_handle, actor_handle = row +``` + +### Existing cloud-backend pattern (admin.py reference) + +```python +# Source: [VERIFIED: codebase] — admin.py lines 527-539 +for doc in cloud_docs: + try: + backend = await get_storage_backend_for_document(doc, user, session) + await backend.delete_object(doc.object_key) + except Exception: + pass # best-effort; deletion proceeds regardless +``` + +### Existing fetch+Blob pattern (client.js reference) + +```javascript +// Source: [VERIFIED: codebase] — client.js lines 399-428 (fetchDocumentContent) +const res = await fetch(`/api/documents/${docId}/content`, { + headers: { 'Authorization': `Bearer ${authStore.accessToken}` }, + credentials: 'include', +}) +// For CSV, use res.text() instead of res.blob() +const text = await res.text() +const blob = new Blob([text], { type: 'text/csv' }) +const url = URL.createObjectURL(blob) +// ... click + revokeObjectURL +``` + +### Existing StreamingResponse for CSV (audit.py reference) + +```python +# Source: [VERIFIED: codebase] — audit.py lines 158-162 +return StreamingResponse( + iter([output.getvalue()]), + media_type="text/csv", + headers={"Content-Disposition": "attachment; filename=audit-export.csv"}, +) +``` + +--- + +## State of the Art + +| Old Approach | Current Approach | When Changed | Impact | +|--------------|------------------|--------------|--------| +| `window.location.href` for auth downloads | `fetch()` + Blob URL | This phase | Enables Bearer token on every request | +| `user_id` UUID filter in audit log | `user_handle` string filter | This phase | Human-friendly; no UUID knowledge required | +| Audit log shows raw UUIDs | Shows `user_handle` / `actor_handle` | This phase | Admin can read who did what without cross-referencing user list | +| Cloud delete only removes DB record | Cloud delete propagates to provider | This phase | CLOUD provider storage reclaimed; no orphaned files | + +--- + +## Assumptions Log + +| # | Claim | Section | Risk if Wrong | +|---|-------|---------|---------------| +| A1 | SQLAlchemy 2.0 multi-column aliased JOIN returns `Row` tuples accessible as `row[0]`, `row[1]`, `row[2]` | Architecture Patterns §3 | Query rows parsed incorrectly; `_audit_to_dict_with_handles` receives wrong arguments | +| A2 | `Minio.list_objects()` iterator is safe to consume inside `asyncio.to_thread` without additional threading concerns | Architecture Patterns §6 | If the SDK has internal async state, `to_thread` could cause issues — unlikely given SDK is sync-only | +| A3 | `is_dir` objects from `list_objects` can be filtered by checking `obj.object_name.endswith(".csv")` | Architecture Patterns §6 | Directory marker objects (is_dir=True) have object_name ending in "/"; `.endswith(".csv")` check correctly excludes them | + +**If this table is empty:** All claims in this research were verified or cited — no user confirmation needed. + +--- + +## Open Questions + +1. **Cloud delete partial failure: HTTP status code choice** + - What we know: D-03 says return a structured error the frontend can distinguish from hard failure. + - What's unclear: Should the endpoint return HTTP 200 with `{"cloud_delete_failed": true}` or a specific non-error HTTP status like 207 (Multi-Status)? + - Recommendation: Return HTTP 200 with `{"success": false, "cloud_delete_failed": true, "detail": "..."}`. The frontend checks `data.cloud_delete_failed` to show the warning modal. HTTP 207 is unusual for REST APIs and harder for `client.js` to handle given it calls `res.ok` check. 200 with a distinguishing body flag is simpler and consistent with the existing pattern where `ok = False` from `storage.delete_document` returns 404 — this case is distinct (partial success). + +2. **Audit export query param rename: `user_id` → `user_handle`** + - What we know: The current endpoint accepts `user_id: Optional[uuid.UUID]` via Query. Changing this to `Optional[str]` changes the public API. + - What's unclear: Existing tests pass a UUID string as `user_id`; if the param is renamed to `user_handle`, tests need updating. + - Recommendation: Keep `user_id` as the query param name but change its type to `Optional[str]`. If the value parses as a valid UUID, treat it as a direct UUID filter (backward compat). Otherwise treat as a handle. This avoids breaking any existing automation. Or, add `user_handle` as a new param alongside `user_id` and deprecate `user_id`. The decision affects the existing `test_audit.py` tests. + +--- + +## Environment Availability + +No external tools or services beyond what the project already uses. All dependencies are present. + +| Dependency | Required By | Available | Version | Fallback | +|------------|------------|-----------|---------|----------| +| Minio Python SDK | Daily export listing/streaming | Yes | current | — | +| FastAPI | PATCH endpoint, query params | Yes | 0.136+ | — | +| SQLAlchemy async | Handle JOIN | Yes | 2.x | — | +| Vue 3 / Pinia | Frontend changes | Yes | 3.x | — | + +--- + +## Validation Architecture + +### Test Framework + +| Property | Value | +|----------|-------| +| Framework | pytest + pytest-asyncio (backend); Vitest 4.1.7 (frontend) | +| Config file | `backend/pytest.ini` or `backend/pyproject.toml` | +| Quick run command | `cd backend && pytest tests/test_shares.py tests/test_audit.py -x -q` | +| Full suite command | `cd backend && pytest -v` | + +Current baseline: **310 passed, 1 pre-existing failure** (`test_extractor.py::test_extract_docx` — unrelated ModuleNotFoundError), 5 skipped, 10 xfailed. + +### Phase Requirements to Test Map + +| Req ID | Behavior | Test Type | Automated Command | File Exists? | +|--------|----------|-----------|-------------------|-------------| +| SHARE-03 | POST /api/shares respects `permission` field | integration | `pytest tests/test_shares.py::test_share_create_with_permission -x` | Wave 0 gap | +| SHARE-03 | PATCH /api/shares/{id} changes permission | integration | `pytest tests/test_shares.py::test_share_patch_permission -x` | Wave 0 gap | +| SHARE-03 | PATCH /api/shares/{id} wrong owner → 404 | integration | `pytest tests/test_shares.py::test_share_patch_idor -x` | Wave 0 gap | +| SHARE-05 | `is_shared` field (not `share_count`) drives badge | unit/integration | `pytest tests/test_shares.py::test_share_indicator_in_owner_list -x` | Exists (passes) | +| CLOUD-del | delete_document routes to cloud backend | integration (mock) | `pytest tests/test_documents.py::test_delete_cloud_document_propagates -x` | Wave 0 gap | +| CLOUD-del | Cloud delete failure returns structured error | integration (mock) | `pytest tests/test_documents.py::test_delete_cloud_document_failure -x` | Wave 0 gap | +| CLOUD-del | remove_only=true skips cloud, removes DB | integration (mock) | `pytest tests/test_documents.py::test_delete_cloud_remove_only -x` | Wave 0 gap | +| ADMIN-06 | Audit log response includes user_handle | integration | `pytest tests/test_audit.py::test_audit_log_includes_user_handle -x` | Wave 0 gap | +| ADMIN-06 | user_handle filter resolves to correct entries | integration | `pytest tests/test_audit.py::test_audit_log_filter_by_handle -x` | Wave 0 gap | +| ADMIN-06 | unknown handle filter returns empty | integration | `pytest tests/test_audit.py::test_audit_log_filter_unknown_handle -x` | Wave 0 gap | +| ADMIN-06 | Daily exports list endpoint returns keys | integration (mock) | `pytest tests/test_audit.py::test_daily_exports_list -x` | Wave 0 gap | +| ADMIN-06 | Daily export download returns CSV bytes | integration (mock) | `pytest tests/test_audit.py::test_daily_export_download -x` | Wave 0 gap | + +### Sampling Rate + +- **Per task commit:** `cd backend && pytest tests/test_shares.py tests/test_audit.py tests/test_documents.py -x -q` +- **Per wave merge:** `cd backend && pytest -v` +- **Phase gate:** Full suite green (excluding pre-existing `test_extract_docx` failure) before `/gsd:verify-work` + +### Wave 0 Gaps + +- [ ] `tests/test_shares.py::test_share_create_with_permission` — covers SHARE-03 POST permission field +- [ ] `tests/test_shares.py::test_share_patch_permission` — covers SHARE-03 PATCH endpoint +- [ ] `tests/test_shares.py::test_share_patch_idor` — covers IDOR security invariant on PATCH +- [ ] `tests/test_documents.py::test_delete_cloud_document_propagates` — covers cloud delete routing +- [ ] `tests/test_documents.py::test_delete_cloud_document_failure` — covers D-03 structured error response +- [ ] `tests/test_documents.py::test_delete_cloud_remove_only` — covers D-02 remove_only path +- [ ] `tests/test_audit.py::test_audit_log_includes_user_handle` — covers D-11 handle enrichment +- [ ] `tests/test_audit.py::test_audit_log_filter_by_handle` — covers D-12 handle filter +- [ ] `tests/test_audit.py::test_audit_log_filter_unknown_handle` — covers D-12 empty result +- [ ] `tests/test_audit.py::test_daily_exports_list` — covers D-15 listing endpoint +- [ ] `tests/test_audit.py::test_daily_export_download` — covers D-16 streaming endpoint + +--- + +## Security Domain + +### Applicable ASVS Categories + +| ASVS Category | Applies | Standard Control | +|---------------|---------|-----------------| +| V4 Access Control | Yes | `share.owner_id == current_user.id` on PATCH (IDOR, 404 on mismatch) | +| V4 Access Control | Yes | `get_current_admin` on all audit-log endpoints including new daily-export ones | +| V5 Input Validation | Yes | `permission` enum validated in Pydantic; `date` regex-validated before MinIO key construction | +| V13 API | Yes | No new public endpoints; all new endpoints gated by existing auth deps | + +### Known Threat Patterns + +| Pattern | STRIDE | Standard Mitigation | +|---------|--------|---------------------| +| IDOR on PATCH /shares/{id} | Elevation of Privilege | `share.owner_id == current_user.id` → 404 on mismatch (T-04-04-02 pattern) | +| Path traversal in daily export date param | Tampering | `re.fullmatch(r"\d{4}-\d{2}-\d{2}", date)` before key construction | +| Token bypass via window.location.href | Info Disclosure | Replace with `fetch()` + `Authorization` header (D-13) | +| Cloud credential exposure in error response | Info Disclosure | Cloud delete exception caught generically; `str(exc)` must NOT include credential data — catch as `Exception`, log internally, return generic message | +| Quota underflow on cloud delete | Tampering | `skip_quota=True` for cloud documents; cloud docs never had quota charged | +| Mass assignment on SharePermissionPatch | Tampering | Minimal Pydantic model with explicit `permission` field only — no `**kwargs` passthrough | + +--- + +## Sources + +### Primary (HIGH confidence) + +- `[VERIFIED: codebase]` — `backend/services/storage.py` lines 143-179 — `delete_document()` implementation; quota decrement pattern; MinIO-only delete +- `[VERIFIED: codebase]` — `backend/storage/__init__.py` — `get_storage_backend_for_document()` factory; cloud credential routing +- `[VERIFIED: codebase]` — `backend/api/admin.py` lines 527-539 — canonical cloud delete pattern (per-doc) to follow +- `[VERIFIED: codebase]` — `backend/api/shares.py` — existing IDOR pattern; `permission="view"` hardcode at line 97; route ordering constraint +- `[VERIFIED: codebase]` — `backend/api/audit.py` — `_audit_to_dict()` pure function; `_build_filtered_query()`; both endpoints +- `[VERIFIED: codebase]` — `backend/tasks/audit_tasks.py` — key pattern `audit-logs/{date}.csv`; `put_object_raw` usage +- `[VERIFIED: codebase]` — `frontend/src/api/client.js` lines 399-428 — `fetchDocumentContent` as fetch+Blob precedent +- `[VERIFIED: codebase]` — `frontend/src/components/admin/AuditLogTab.vue` lines 185-191 — broken `window.location.href` export +- `[VERIFIED: codebase]` — `frontend/src/components/documents/DocumentCard.vue` line 31 — `share_count > 0` bug +- `[VERIFIED: npm registry]` — Minio Python SDK `list_objects` signature confirmed at runtime; `Object.object_name`, `Object.is_dir` attributes verified + +### Secondary (MEDIUM confidence) + +- `[ASSUMED]` — SQLAlchemy 2.0 aliased double-JOIN returns `Row` tuples with positional access — documented in SA2.0 changelog but not verified via Context7 in this session + +--- + +## Metadata + +**Confidence breakdown:** + +- Standard stack: HIGH — all packages already in use; no new dependencies +- Architecture: HIGH — all patterns drawn from existing codebase code; no external documentation required +- Pitfalls: HIGH — each pitfall is backed by specific code paths read directly from the source files +- Test gaps: HIGH — based on reading existing test files and counting what is not yet covered + +**Research date:** 2026-05-31 +**Valid until:** 2026-07-01 (stable brownfield — no fast-moving dependencies)