Files
kite/.planning/phases/06.2-close-v1-sharing-cloud-delete-csv-export-gaps/06.2-RESEARCH.md
T
2026-05-31 11:10:59 +02:00

44 KiB

Phase 6.2: Close v1 sharing + cloud-delete + CSV export gaps — Research

Researched: 2026-05-31 Domain: FastAPI PATCH endpoints, cloud storage backend routing, Fetch API blob download, MinIO list_objects, SQLAlchemy async JOIN Confidence: HIGH (all findings verified against the live codebase)


<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

  • D-00: create_user 500 fix already committed — await session.flush() before write_audit_log in backend/api/admin.py.
  • D-01: Default delete propagates to cloud provider. delete_document() in services/storage.py must route to get_storage_backend_for_document(doc, user, session) for non-MinIO docs.
  • D-02: "Remove from app only" is a separate, distinct action that deletes the DB record while leaving the cloud file intact.
  • D-03: Cloud provider delete failure: warning modal "Cloud delete failed. Remove from app anyway?" — user chooses. Backend returns structured error distinguishable from hard 4xx/5xx.
  • D-04: Cloud documents do NOT touch MinIO quota on delete (unchanged from existing design).
  • D-05: Persistent local Celery cache for cloud docs — DEFERRED. Not this phase.
  • D-06: DocumentCard.vue:31 — change v-if="doc.share_count > 0" to v-if="doc.is_shared".
  • D-07: Permission levels: view and edit. Share.permission column already exists. No migration.
  • D-08: Permission set at share creation: add view/edit dropdown to ShareModal.vue before submitting. Existing POST endpoint already accepts permission field.
  • D-09: Permission changeable after creation: add View/Edit toggle per share row in ShareModal.vue. Calls new PATCH /api/shares/{id} with { permission: "view" | "edit" }. Owner-only, 404 for wrong owner.
  • D-10: Fix shares.py POST hardcoded permission="view" (line 97) — read from request body instead.
  • D-11: _audit_to_dict() — extend to return user_handle and actor_handle via JOIN on User table.
  • D-12: user_id filter in GET /api/admin/audit-log — accept handle string, resolve to UUID via User.handle == handle, return empty if not found.
  • D-13: Replace window.location.href in AuditLogTab.vue:exportCsv() with fetch() + Blob URL pattern.
  • D-14: Add adminExportAuditLogCsv(params) to client.js — must call res.text() (or res.blob()), NOT res.json(). Create object URL, trigger <a> click download.
  • D-15: Add GET /api/admin/audit-log/daily-exports — lists MinIO audit-logs bucket contents. Returns [{ date, key }] sorted descending.
  • D-16: Add GET /api/admin/audit-log/daily-exports/{date} — streams specific daily export from MinIO as text/csv. Auth: get_current_admin.
  • D-17: Frontend AuditLogTab.vue: date dropdown from adminListDailyExports(), "Download" button uses fetch() + Blob URL.

Claude's Discretion

  • Exact API shape for "remove from app only" vs "delete from provider" (?cloud_only=false query param vs separate endpoint).
  • Whether PATCH /api/shares/{id} accepts full body or just { permission: "view"|"edit" } — minimal body preferred.
  • Exact error response shape for cloud delete failure — must be distinguishable from hard 4xx/5xx.
  • MinIO list_objects pagination — handle if audit-logs bucket has >1000 files.

Deferred Ideas (OUT OF SCOPE)

  • Persistent local Celery cache for cloud docs with quota tracking.
  • Celery local cache "Remove download" button.
  • SHARE-03 permission levels beyond view/edit.

</user_constraints>

<phase_requirements>

Phase Requirements

ID Description Research Support
SHARE-03 Shared access is view-only by default; owner controls permission level D-07..D-10: Share.permission column exists; POST hardcodes view; PATCH endpoint needed; frontend dropdown needed
SHARE-05 Documents shared with others display "shared" indicator D-06: backend sends is_shared; frontend checks wrong field share_count
CLOUD (del) Cloud document deletion propagates to provider D-01..D-04: delete_document() must route to get_storage_backend_for_document(); structured error response needed
ADMIN-06 Admin audit log with filters and export D-11..D-17: user handles in responses, handle filter, CSV fetch download, daily export list + stream

</phase_requirements>


Summary

Phase 6.2 is a brownfield gap-closure sprint with four independent feature areas. All four areas have the backend plumbing already present — the gaps are surgical additions and fixes, not architectural changes.

Area 1 — Cloud-delete propagation: services/storage.py:delete_document() unconditionally calls _backend().delete_object() (MinIO singleton). For cloud documents (doc.storage_backend != "minio") it must instead call get_storage_backend_for_document(doc, user, session). This function is already in backend/storage/__init__.py and used by both the content proxy and admin.py:delete_user(). The wrinkle is that delete_document() currently does not receive the user ORM object or session — the service layer only receives session and doc_id. The cleanest fix is to extend the function signature or to perform the cloud routing in api/documents.py:delete_document before calling storage.delete_document(). The admin delete_user pattern (lines 527-539) is the canonical reference: call get_storage_backend_for_document, then backend.delete_object, wrapped in try/except. A separate "remove from app only" action (D-02) needs an endpoint that skips cloud deletion entirely — a ?remove_only=true query parameter on DELETE /api/documents/{id} is the cleanest route since it shares the auth/ownership gate without duplicating a whole endpoint.

Area 2 — SHARE-03/05 fixes: Three micro-changes. (1) DocumentCard.vue line 31: share_count > 0 to is_shared. (2) shares.py POST: read permission from request body (add to ShareCreate model). (3) New PATCH /api/shares/{id} endpoint with minimal body { permission: "view" | "edit" }, owner-enforced IDOR (404 on mismatch, mirroring the DELETE pattern). (4) Frontend ShareModal.vue: dropdown before submit + toggle per row.

Area 3 — Audit log user handles: _audit_to_dict() is a pure function operating on a loaded AuditLog ORM object — it does not have access to a session. Two options: (a) change it to accept optional pre-fetched handle dicts alongside the entry, or (b) make the list/export endpoints JOIN User as an alias twice (once for user_id, once for actor_id) and pass the handles as extra arguments. Option (b) is cleaner and avoids N+1 queries. The _build_filtered_query() helper returns a select(AuditLog) — it needs to be extended to JOIN User twice (with SQLAlchemy aliases) and yield (AuditLog, user_handle, actor_handle) tuples. The user_id filter must change from accepting Optional[uuid.UUID] to accepting Optional[str] and resolving the handle in a preliminary query.

Area 4 — CSV export + daily exports: The export endpoint is already a StreamingResponse — the only change is on the frontend where window.location.href must become fetch() + Blob URL. The daily export listing uses Minio.list_objects(bucket_name="audit-logs", prefix="audit-logs/") which returns a synchronous iterator — it must be consumed in asyncio.to_thread(). The key pattern audit-logs/{date}.csv is guaranteed by audit_tasks.py:79.

Primary recommendation: Implement in four vertical slices, each deployable independently. Start with SHARE-05 (one-line fix), then SHARE-03, then cloud-delete, then audit log cluster (user handles + CSV + daily exports).


Architectural Responsibility Map

Capability Primary Tier Secondary Tier Rationale
Cloud-delete routing API / Backend Must happen server-side where credentials are decrypted; cannot be client-driven
"Remove from app" vs "delete from cloud" API / Backend Browser / Client Query param distinction at API layer; UX confirmation modal at client layer
Share permission PATCH API / Backend Browser / Client IDOR enforcement is a backend concern; toggle UI is client
"Shared" badge display Browser / Client Trivially reads doc.is_shared field from list response
Audit log handle enrichment API / Backend JOIN happens in DB query layer; frontend only receives enriched response
CSV export download Browser / Client API / Backend Authentication requires fetch() with Bearer header — browser nav cannot send it
Daily export listing API / Backend Browser / Client MinIO query is server-side; dropdown is client
Daily export stream API / Backend Browser / Client Streaming response with auth gate; download trigger at client

Standard Stack

This phase adds no new packages. All functionality uses the existing stack.

Core (already installed)

Library Version Purpose Relevance to Phase
FastAPI 0.136+ API framework PATCH endpoint, query params, StreamingResponse
SQLAlchemy 2.0 async 2.x ORM JOIN with aliased User twice for handle enrichment
Minio (Python SDK) current MinIO S3 client list_objects, get_object for daily exports
Pydantic v2 2.x Request validation SharePermissionPatch minimal body model
Vue 3 (Options API/Composition) 3.x Frontend framework Dropdown, toggle, fetch+Blob download

No new packages required

All operations — cloud backend routing, streaming responses, fetch+Blob download — use code already present in the project.

Package Legitimacy Audit

Not applicable — no new packages installed in this phase.


Architecture Patterns

System Architecture Diagram

DELETE /api/documents/{id}?remove_only=false (default)
  └── api/documents.py:delete_document
        ├── ownership assert (doc.user_id == current_user.id)
        ├── if doc.storage_backend == "minio":
        │     └── services/storage.py:delete_document()  ← (unchanged path)
        │           ├── _backend().delete_object(key)
        │           └── quota decrement (STORE-06)
        └── else (cloud backend):
              ├── get_storage_backend_for_document(doc, user, session)
              │     └── decrypts HKDF credentials, returns cloud backend
              ├── try: cloud_backend.delete_object(doc.object_key)
              │   except: return {"cloud_delete_failed": true, "detail": "..."}  ← 207/200 structured
              └── if cloud delete succeeded (or user confirmed remove_only):
                    └── services/storage.py:_db_only_delete()  ← quota skipped for cloud docs

PATCH /api/shares/{id}
  └── api/shares.py:update_share_permission
        ├── UUID parse (404 on invalid)
        ├── session.get(Share, sid)
        ├── assert share.owner_id == current_user.id → 404 on mismatch (IDOR)
        ├── validate body.permission in {"view", "edit"}
        └── share.permission = body.permission; commit

GET /api/admin/audit-log (with handle enrichment)
  └── api/audit.py:list_audit_log
        ├── if user_handle param: resolve handle → UUID, or empty result if not found
        ├── _build_filtered_query_with_handles(start, end, user_uuid, event_type)
        │     └── select(AuditLog, user_alias.handle, actor_alias.handle)
        │           .outerjoin(user_alias, user_alias.id == AuditLog.user_id)
        │           .outerjoin(actor_alias, actor_alias.id == AuditLog.actor_id)
        └── _audit_to_dict(entry, user_handle, actor_handle) → dict with both handles

GET /api/admin/audit-log/export (CSV with auth)
  └── StreamingResponse (unchanged backend)
  ← client: fetch() + res.text() + Blob URL + <a> click download

GET /api/admin/audit-log/daily-exports
  └── asyncio.to_thread(minio_client.list_objects, "audit-logs", prefix="audit-logs/")
        └── returns [{date: "YYYY-MM-DD", key: "audit-logs/YYYY-MM-DD.csv"}, ...]

GET /api/admin/audit-log/daily-exports/{date}
  └── key = f"audit-logs/{date}.csv"
        ├── asyncio.to_thread(minio_client.get_object, "audit-logs", key)
        └── StreamingResponse(iter([csv_bytes]), media_type="text/csv")
backend/
├── api/
│   ├── shares.py          # + PATCH /{id} endpoint; fix ShareCreate.permission
│   └── audit.py           # + handle JOIN; + 2 daily-export endpoints; fix user filter
├── services/
│   └── storage.py         # + cloud routing in delete_document; + _db_only_delete()
frontend/src/
├── api/
│   └── client.js          # + adminExportAuditLogCsv(); + adminListDailyExports(); + adminDownloadDailyExport()
├── components/
│   ├── admin/
│   │   └── AuditLogTab.vue  # fix exportCsv(); + daily exports date dropdown
│   ├── sharing/
│   │   └── ShareModal.vue  # + permission dropdown on share; + toggle per row
│   └── documents/
│       └── DocumentCard.vue  # line 31: share_count → is_shared

Pattern 1: Cloud-delete routing in services/storage.py

The cleanest approach to the signature problem: delete_document() gets two new optional parameters user and session_for_cloud. When user is provided and doc.storage_backend != "minio", cloud routing fires. The caller in api/documents.py already has both the current_user and session in scope.

Alternative (preferred for simplicity): Move the cloud routing entirely into api/documents.py:delete_document, before the call to storage.delete_document(). The service layer function handles only DB + MinIO quota. The API layer decides routing. This keeps services/storage.py free of cloud-backend imports and mirrors how admin.py:delete_user() works — the API layer calls get_storage_backend_for_document() and the service layer is unaware of cloud backends.

# api/documents.py — delete_document (modified)
@router.delete("/{doc_id}")
async def delete_document(
    doc_id: str,
    remove_only: bool = Query(default=False),  # D-02: skip cloud delete
    request: Request,
    session: AsyncSession = Depends(get_db),
    current_user: User = Depends(get_regular_user),
):
    uid = uuid.UUID(doc_id)
    doc = await session.get(Document, uid)
    if doc is None or doc.user_id != current_user.id:
        raise HTTPException(404, "Document not found")

    cloud_delete_failed = False
    if doc.storage_backend != "minio" and not remove_only:
        try:
            cloud_backend = await get_storage_backend_for_document(doc, current_user, session)
            await cloud_backend.delete_object(doc.object_key)
        except Exception as exc:
            # D-03: structured error; frontend shows warning modal
            cloud_delete_failed = True
            # Caller (frontend) decides whether to retry with remove_only=true

    if cloud_delete_failed and not remove_only:
        return JSONResponse(
            status_code=200,
            content={
                "success": False,
                "cloud_delete_failed": True,
                "detail": "Cloud provider delete failed. You can remove from app only.",
            }
        )

    # DB delete + quota decrement (cloud docs skip quota — D-04)
    ok = await storage.delete_document(session, doc_id, skip_quota=doc.storage_backend != "minio")
    # ... audit log + commit

Key insight on quota: Cloud docs already skip quota at upload time (no UPDATE quotas in cloud upload path). The delete_document() in services/storage.py currently always decrements quota. It needs a skip_quota: bool = False guard for cloud documents.

Source: [VERIFIED: codebase] — confirmed by reading services/storage.py:163-175 and api/documents.py:269-291.

Pattern 2: PATCH /api/shares/{id} — minimal IDOR-safe pattern

# api/shares.py — new endpoint
class SharePermissionPatch(BaseModel):
    permission: str  # validated: "view" | "edit"

    @field_validator("permission")
    @classmethod
    def validate_permission(cls, v: str) -> str:
        if v not in {"view", "edit"}:
            raise ValueError("permission must be 'view' or 'edit'")
        return v

# CRITICAL: must be defined BEFORE DELETE /{share_id} to avoid path conflict
@router.patch("/{share_id}", status_code=status.HTTP_200_OK)
async def update_share_permission(
    share_id: str,
    body: SharePermissionPatch,
    session: AsyncSession = Depends(get_db),
    current_user: User = Depends(get_regular_user),
) -> dict:
    try:
        sid = uuid.UUID(share_id)
    except ValueError:
        raise HTTPException(status_code=404, detail="Share not found")

    share = await session.get(Share, sid)
    # IDOR: 404 on mismatch — mirrors DELETE pattern (T-04-04-02)
    if share is None or share.owner_id != current_user.id:
        raise HTTPException(status_code=404, detail="Share not found")

    share.permission = body.permission
    await session.commit()
    return {"id": str(share.id), "permission": share.permission}

Source: [VERIFIED: codebase] — mirrors revoke_share IDOR pattern at lines 240-265 in backend/api/shares.py.

Pattern 3: Handle enrichment via SQLAlchemy aliased JOIN

# api/audit.py — handle enrichment
from sqlalchemy.orm import aliased

UserAsSubject = aliased(User)   # for user_id FK
UserAsActor   = aliased(User)   # for actor_id FK

def _build_filtered_query_with_handles(start, end, user_uuid, event_type):
    q = (
        select(AuditLog, UserAsSubject.handle, UserAsActor.handle)
        .outerjoin(UserAsSubject, UserAsSubject.id == AuditLog.user_id)
        .outerjoin(UserAsActor, UserAsActor.id == AuditLog.actor_id)
        .order_by(AuditLog.created_at.desc())
    )
    if start:   q = q.where(AuditLog.created_at >= start)
    if end:     q = q.where(AuditLog.created_at <= end)
    if user_uuid: q = q.where(AuditLog.user_id == user_uuid)
    if event_type: q = q.where(AuditLog.event_type == event_type)
    return q

def _audit_to_dict_with_handles(entry, user_handle, actor_handle) -> dict:
    d = {
        "id": entry.id,
        "event_type": entry.event_type,
        "user_id":   str(entry.user_id)  if entry.user_id  else None,
        "actor_id":  str(entry.actor_id) if entry.actor_id else None,
        "user_handle":  user_handle  or None,
        "actor_handle": actor_handle or None,
        "resource_id": str(entry.resource_id) if entry.resource_id else None,
        "ip_address": str(entry.ip_address) if entry.ip_address else None,
        "metadata_": entry.metadata_,
        "created_at": entry.created_at.isoformat(),
    }
    return d

Result rows from a multi-column select in SQLAlchemy 2.0 async are Row tuples — access as row[0] (AuditLog), row[1] (user_handle), row[2] (actor_handle). result.all() returns list[Row].

Source: [ASSUMED] — SQLAlchemy 2.0 aliased JOIN pattern is well-documented but not verified via Context7 in this session.

Pattern 4: Handle-to-UUID resolution for user filter

# In list_audit_log / export_audit_log endpoint handlers:
user_uuid: Optional[uuid.UUID] = None
if user_handle_param:   # new str param replacing Optional[uuid.UUID]
    handle_result = await session.execute(
        select(User.id).where(User.handle == user_handle_param)
    )
    uid = handle_result.scalar_one_or_none()
    if uid is None:
        # No user with that handle → return empty results (D-12)
        return {"items": [], "total": 0, "page": page, "per_page": per_page}
    user_uuid = uid

Source: [VERIFIED: codebase]User.handle is a unique indexed column (models.py line 52).

Pattern 5: Fetch + Blob URL download in Vue 3 (CSV export)

// frontend/src/api/client.js — new function
export async function adminExportAuditLogCsv(params = {}) {
  const { useAuthStore } = await import('../stores/auth.js')
  const authStore = useAuthStore()

  const searchParams = new URLSearchParams({ format: 'csv' })
  if (params.start)      searchParams.set('start',      params.start)
  if (params.end)        searchParams.set('end',        params.end)
  if (params.user_handle) searchParams.set('user_handle', params.user_handle)
  if (params.event_type) searchParams.set('event_type', params.event_type)

  const headers = {}
  if (authStore.accessToken) {
    headers['Authorization'] = `Bearer ${authStore.accessToken}`
  }

  const res = await fetch(`/api/admin/audit-log/export?${searchParams}`, {
    headers,
    credentials: 'include',
  })
  if (!res.ok) throw new Error(`Export failed: ${res.status}`)

  const text = await res.text()                   // CSV is text, not JSON
  const blob = new Blob([text], { type: 'text/csv' })
  const url  = URL.createObjectURL(blob)
  const a    = document.createElement('a')
  a.href     = url
  a.download = 'audit-export.csv'
  a.click()
  URL.revokeObjectURL(url)
}

This is the same pattern as fetchDocumentContent() already in client.js — raw fetch() with Authorization header, NOT using the shared request() wrapper (which always calls res.json()).

Source: [VERIFIED: codebase]fetchDocumentContent at lines 399-428 of client.js is the exact precedent.

Pattern 6: MinIO daily export listing (asyncio.to_thread)

# api/audit.py — new endpoint
@router.get("/audit-log/daily-exports")
async def list_daily_exports(
    session: AsyncSession = Depends(get_db),
    _admin: User = Depends(get_current_admin),
) -> dict:
    """List available daily audit export files from MinIO audit-logs bucket."""
    from storage import get_storage_backend
    from storage.minio_backend import MinIOBackend

    backend = get_storage_backend()
    if not isinstance(backend, MinIOBackend):
        return {"items": []}

    def _list() -> list[dict]:
        objects = backend._client.list_objects(
            "audit-logs", prefix="audit-logs/", recursive=False
        )
        items = []
        for obj in objects:
            if obj.object_name and obj.object_name.endswith(".csv"):
                # key: "audit-logs/2026-05-30.csv" → date: "2026-05-30"
                filename = obj.object_name.removeprefix("audit-logs/").removesuffix(".csv")
                items.append({"date": filename, "key": obj.object_name})
        items.sort(key=lambda x: x["date"], reverse=True)
        return items

    items = await asyncio.to_thread(_list)
    return {"items": items}

Minio.list_objects() returns a synchronous iterator; consuming it inside asyncio.to_thread blocks the thread (not the event loop). The iterator is lazy — it pages automatically. For fewer than ~1000 objects (years of daily exports) it completes in a single S3 ListObjectsV2 call. [VERIFIED: codebase] — confirmed by inspecting Minio SDK list_objects signature at runtime; obj.object_name and obj.is_dir are the key attributes.

Pattern 7: MinIO daily export streaming download

@router.get("/audit-log/daily-exports/{date}")
async def download_daily_export(
    date: str,
    session: AsyncSession = Depends(get_db),
    _admin: User = Depends(get_current_admin),
) -> StreamingResponse:
    import re, asyncio
    if not re.fullmatch(r"\d{4}-\d{2}-\d{2}", date):
        raise HTTPException(status_code=404, detail="Invalid date format")

    from storage import get_storage_backend
    backend = get_storage_backend()
    key = f"audit-logs/{date}.csv"

    def _get() -> bytes:
        response = backend._client.get_object("audit-logs", key)
        try:
            return response.read()
        finally:
            response.close()
            response.release_conn()

    try:
        csv_bytes = await asyncio.to_thread(_get)
    except Exception:
        raise HTTPException(status_code=404, detail="Export not found")

    return StreamingResponse(
        iter([csv_bytes]),
        media_type="text/csv",
        headers={"Content-Disposition": f'attachment; filename="audit-{date}.csv"'},
    )

Note: The get_object call uses "audit-logs" bucket (separate from documents bucket), mirroring audit_tasks.py. Date path parameter must be validated against YYYY-MM-DD regex to prevent path traversal.

Source: [VERIFIED: codebase]MinIOBackend.get_object() pattern at lines 110-121 of minio_backend.py.

Anti-Patterns to Avoid

  • Calling _backend() singleton for cloud deletes: _backend() in services/storage.py always returns MinIOBackend. Cloud docs must use get_storage_backend_for_document(). Using the singleton for cloud docs would silently succeed (MinIO would return no error for a key that doesn't exist there) while leaving the cloud file orphaned.
  • Defining PATCH /{share_id} AFTER DELETE /{share_id}: FastAPI routes earlier-defined paths first. A PATCH /{share_id} defined after DELETE /{share_id} on the same router will work correctly for PATCH (different HTTP method). The ordering concern only affects routes with overlapping path patterns (e.g., /received vs /{share_id} GET routes). For PATCH vs DELETE on /{share_id}, method discrimination is unambiguous. The existing comment in shares.py confirms /received must come before /{share_id} for GET routes only.
  • Calling res.json() for CSV download: The existing request() function in client.js always calls res.json() and will throw on text/csv responses. Any CSV or binary endpoint MUST use a separate function that calls res.text() or res.blob().
  • Quota decrement on cloud document delete: Cloud docs never touched quota at upload time — decrementing quota on delete would corrupt the counter. The skip_quota guard must be explicit.
  • Passing user_id as a UUID query param for handle-based filter: The current endpoint signature accepts Optional[uuid.UUID] via Query, which causes FastAPI to 422 on non-UUID strings. Changing to Optional[str] and doing handle resolution in the handler body is required.
  • Blocking the event loop with Minio iterator: list_objects() is synchronous. Consuming it directly in an async handler without asyncio.to_thread would block the FastAPI event loop.

Don't Hand-Roll

Problem Don't Build Use Instead Why
Cloud backend instantiation Custom credential decrypt + backend init get_storage_backend_for_document(doc, user, session) Already handles HKDF decrypt, lazy imports, 503 on inactive connection
HKDF key derivation Custom Fernet wrapper storage.cloud_utils.decrypt_credentials() One implementation, already tested
Streaming CSV response Custom write loop io.StringIO + csv.DictWriter + StreamingResponse(iter([output.getvalue()])) Exact pattern already in api/audit.py:export_audit_log
Blob download trigger Custom download UI Blob + URL.createObjectURL + <a> click Exact pattern used by fetchDocumentContent in client.js
MinIO audit-logs GET Custom HTTP call MinIOBackend._client.get_object("audit-logs", key) in asyncio.to_thread Already used for doc content; same response.read() + response.close() pattern

Common Pitfalls

Pitfall 1: delete_document() quota decrement for cloud docs

What goes wrong: The existing delete_document() in services/storage.py always runs UPDATE quotas SET used_bytes = .... Cloud documents were never counted against quota at upload time (api/documents.py:upload_cloud() does not call the confirm endpoint that updates quota). Decrementing quota on cloud doc delete would underflow the counter. Why it happens: The service was written for MinIO only. The quota decrement was correct when there were no cloud docs. How to avoid: Add skip_quota: bool = False parameter. Set to True when doc.storage_backend != "minio". Alternatively, perform the DB delete separately for cloud docs without calling delete_document() at all — use the admin.py pattern. Warning signs: quotas.used_bytes becomes negative in testing after cloud doc delete.

Pitfall 2: Cloud delete success even though the file remains

What goes wrong: MinIO's remove_object() is a no-op for non-existent keys. If cloud backend routing is broken and the call falls through to the MinIO singleton, it will return without error while leaving the cloud file intact. Why it happens: MinIO SDK does not raise on missing keys by default. How to avoid: Log the doc.storage_backend value before the delete call during testing. Assert in tests that cloud backend's delete_object mock was called, not MinIO's.

Pitfall 3: PATCH route ordering / path conflict

What goes wrong: If a PATCH endpoint at /{share_id} is somehow confused with another route. Why it happens: FastAPI path routing is order-dependent for same-method same-prefix routes. How to avoid: For PATCH /{share_id}, no conflict exists with the current GET /received special case (different method). Simply add PATCH before DELETE as a matter of style.

Pitfall 4: AuditLog COUNT query broken after JOIN

What goes wrong: The count query select(func.count()).select_from(base_q.subquery()) works when base_q selects a single entity. After adding a multi-column JOIN, the subquery shape changes and the count may become a count of tuples rather than a count of AuditLog rows. Why it happens: SQLAlchemy 2.0 multi-column select wrapped in a subquery; func.count() on an ambiguous subquery. How to avoid: Use a separate count query that does NOT join User: select(func.count(AuditLog.id)).where(<same filters>). Keep the count query and the data query separate.

Pitfall 5: res.text() vs res.blob() for CSV in Vue

What goes wrong: Calling res.json() on a CSV response raises a JSON parse error. Why it happens: The shared request() wrapper always calls res.json(). How to avoid: The CSV download function must be a standalone fetch() call (not via request()) that calls res.text(). This mirrors the existing fetchDocumentContent pattern exactly.

Pitfall 6: Date path parameter traversal in daily-export download

What goes wrong: GET /api/admin/audit-log/daily-exports/../../etc/passwd could construct a malicious MinIO key. Why it happens: Path parameters are URL-decoded before reaching the handler. How to avoid: Validate date against r"\d{4}-\d{2}-\d{2}" regex before constructing f"audit-logs/{date}.csv". FastAPI path parameters with slashes are unusual, but hyphens can still be used in injection attempts.

Pitfall 7: _audit_to_dict() called from two places (viewer + export)

What goes wrong: If only the viewer endpoint is updated to use the new handle-enriched function, the CSV export still emits raw UUIDs. Why it happens: The existing _audit_to_dict() is shared by both endpoints. Both must be updated to the new function signature. How to avoid: Update _audit_to_dict_with_handles is used in BOTH list_audit_log and export_audit_log. Preserve the original _audit_to_dict as a fallback only if needed.


Code Examples

SQLAlchemy 2.0 async aliased double-JOIN (verified pattern)

# Source: [VERIFIED: codebase] — mirrors Share→User join at shares.py:155-161
from sqlalchemy.orm import aliased
from sqlalchemy import select

UserSubject = aliased(User)
UserActor   = aliased(User)

stmt = (
    select(AuditLog, UserSubject.handle.label("user_handle"), UserActor.handle.label("actor_handle"))
    .outerjoin(UserSubject, UserSubject.id == AuditLog.user_id)
    .outerjoin(UserActor,   UserActor.id   == AuditLog.actor_id)
    .order_by(AuditLog.created_at.desc())
)
result = await session.execute(stmt)
rows = result.all()
for row in rows:
    entry, user_handle, actor_handle = row

Existing cloud-backend pattern (admin.py reference)

# Source: [VERIFIED: codebase] — admin.py lines 527-539
for doc in cloud_docs:
    try:
        backend = await get_storage_backend_for_document(doc, user, session)
        await backend.delete_object(doc.object_key)
    except Exception:
        pass  # best-effort; deletion proceeds regardless

Existing fetch+Blob pattern (client.js reference)

// Source: [VERIFIED: codebase] — client.js lines 399-428 (fetchDocumentContent)
const res = await fetch(`/api/documents/${docId}/content`, {
  headers: { 'Authorization': `Bearer ${authStore.accessToken}` },
  credentials: 'include',
})
// For CSV, use res.text() instead of res.blob()
const text = await res.text()
const blob = new Blob([text], { type: 'text/csv' })
const url  = URL.createObjectURL(blob)
// ... <a> click + revokeObjectURL

Existing StreamingResponse for CSV (audit.py reference)

# Source: [VERIFIED: codebase] — audit.py lines 158-162
return StreamingResponse(
    iter([output.getvalue()]),
    media_type="text/csv",
    headers={"Content-Disposition": "attachment; filename=audit-export.csv"},
)

State of the Art

Old Approach Current Approach When Changed Impact
window.location.href for auth downloads fetch() + Blob URL This phase Enables Bearer token on every request
user_id UUID filter in audit log user_handle string filter This phase Human-friendly; no UUID knowledge required
Audit log shows raw UUIDs Shows user_handle / actor_handle This phase Admin can read who did what without cross-referencing user list
Cloud delete only removes DB record Cloud delete propagates to provider This phase CLOUD provider storage reclaimed; no orphaned files

Assumptions Log

# Claim Section Risk if Wrong
A1 SQLAlchemy 2.0 multi-column aliased JOIN returns Row tuples accessible as row[0], row[1], row[2] Architecture Patterns §3 Query rows parsed incorrectly; _audit_to_dict_with_handles receives wrong arguments
A2 Minio.list_objects() iterator is safe to consume inside asyncio.to_thread without additional threading concerns Architecture Patterns §6 If the SDK has internal async state, to_thread could cause issues — unlikely given SDK is sync-only
A3 is_dir objects from list_objects can be filtered by checking obj.object_name.endswith(".csv") Architecture Patterns §6 Directory marker objects (is_dir=True) have object_name ending in "/"; .endswith(".csv") check correctly excludes them

If this table is empty: All claims in this research were verified or cited — no user confirmation needed.


Open Questions

  1. Cloud delete partial failure: HTTP status code choice

    • What we know: D-03 says return a structured error the frontend can distinguish from hard failure.
    • What's unclear: Should the endpoint return HTTP 200 with {"cloud_delete_failed": true} or a specific non-error HTTP status like 207 (Multi-Status)?
    • Recommendation: Return HTTP 200 with {"success": false, "cloud_delete_failed": true, "detail": "..."}. The frontend checks data.cloud_delete_failed to show the warning modal. HTTP 207 is unusual for REST APIs and harder for client.js to handle given it calls res.ok check. 200 with a distinguishing body flag is simpler and consistent with the existing pattern where ok = False from storage.delete_document returns 404 — this case is distinct (partial success).
  2. Audit export query param rename: user_iduser_handle

    • What we know: The current endpoint accepts user_id: Optional[uuid.UUID] via Query. Changing this to Optional[str] changes the public API.
    • What's unclear: Existing tests pass a UUID string as user_id; if the param is renamed to user_handle, tests need updating.
    • Recommendation: Keep user_id as the query param name but change its type to Optional[str]. If the value parses as a valid UUID, treat it as a direct UUID filter (backward compat). Otherwise treat as a handle. This avoids breaking any existing automation. Or, add user_handle as a new param alongside user_id and deprecate user_id. The decision affects the existing test_audit.py tests.

Environment Availability

No external tools or services beyond what the project already uses. All dependencies are present.

Dependency Required By Available Version Fallback
Minio Python SDK Daily export listing/streaming Yes current
FastAPI PATCH endpoint, query params Yes 0.136+
SQLAlchemy async Handle JOIN Yes 2.x
Vue 3 / Pinia Frontend changes Yes 3.x

Validation Architecture

Test Framework

Property Value
Framework pytest + pytest-asyncio (backend); Vitest 4.1.7 (frontend)
Config file backend/pytest.ini or backend/pyproject.toml
Quick run command cd backend && pytest tests/test_shares.py tests/test_audit.py -x -q
Full suite command cd backend && pytest -v

Current baseline: 310 passed, 1 pre-existing failure (test_extractor.py::test_extract_docx — unrelated ModuleNotFoundError), 5 skipped, 10 xfailed.

Phase Requirements to Test Map

Req ID Behavior Test Type Automated Command File Exists?
SHARE-03 POST /api/shares respects permission field integration pytest tests/test_shares.py::test_share_create_with_permission -x Wave 0 gap
SHARE-03 PATCH /api/shares/{id} changes permission integration pytest tests/test_shares.py::test_share_patch_permission -x Wave 0 gap
SHARE-03 PATCH /api/shares/{id} wrong owner → 404 integration pytest tests/test_shares.py::test_share_patch_idor -x Wave 0 gap
SHARE-05 is_shared field (not share_count) drives badge unit/integration pytest tests/test_shares.py::test_share_indicator_in_owner_list -x Exists (passes)
CLOUD-del delete_document routes to cloud backend integration (mock) pytest tests/test_documents.py::test_delete_cloud_document_propagates -x Wave 0 gap
CLOUD-del Cloud delete failure returns structured error integration (mock) pytest tests/test_documents.py::test_delete_cloud_document_failure -x Wave 0 gap
CLOUD-del remove_only=true skips cloud, removes DB integration (mock) pytest tests/test_documents.py::test_delete_cloud_remove_only -x Wave 0 gap
ADMIN-06 Audit log response includes user_handle integration pytest tests/test_audit.py::test_audit_log_includes_user_handle -x Wave 0 gap
ADMIN-06 user_handle filter resolves to correct entries integration pytest tests/test_audit.py::test_audit_log_filter_by_handle -x Wave 0 gap
ADMIN-06 unknown handle filter returns empty integration pytest tests/test_audit.py::test_audit_log_filter_unknown_handle -x Wave 0 gap
ADMIN-06 Daily exports list endpoint returns keys integration (mock) pytest tests/test_audit.py::test_daily_exports_list -x Wave 0 gap
ADMIN-06 Daily export download returns CSV bytes integration (mock) pytest tests/test_audit.py::test_daily_export_download -x Wave 0 gap

Sampling Rate

  • Per task commit: cd backend && pytest tests/test_shares.py tests/test_audit.py tests/test_documents.py -x -q
  • Per wave merge: cd backend && pytest -v
  • Phase gate: Full suite green (excluding pre-existing test_extract_docx failure) before /gsd:verify-work

Wave 0 Gaps

  • tests/test_shares.py::test_share_create_with_permission — covers SHARE-03 POST permission field
  • tests/test_shares.py::test_share_patch_permission — covers SHARE-03 PATCH endpoint
  • tests/test_shares.py::test_share_patch_idor — covers IDOR security invariant on PATCH
  • tests/test_documents.py::test_delete_cloud_document_propagates — covers cloud delete routing
  • tests/test_documents.py::test_delete_cloud_document_failure — covers D-03 structured error response
  • tests/test_documents.py::test_delete_cloud_remove_only — covers D-02 remove_only path
  • tests/test_audit.py::test_audit_log_includes_user_handle — covers D-11 handle enrichment
  • tests/test_audit.py::test_audit_log_filter_by_handle — covers D-12 handle filter
  • tests/test_audit.py::test_audit_log_filter_unknown_handle — covers D-12 empty result
  • tests/test_audit.py::test_daily_exports_list — covers D-15 listing endpoint
  • tests/test_audit.py::test_daily_export_download — covers D-16 streaming endpoint

Security Domain

Applicable ASVS Categories

ASVS Category Applies Standard Control
V4 Access Control Yes share.owner_id == current_user.id on PATCH (IDOR, 404 on mismatch)
V4 Access Control Yes get_current_admin on all audit-log endpoints including new daily-export ones
V5 Input Validation Yes permission enum validated in Pydantic; date regex-validated before MinIO key construction
V13 API Yes No new public endpoints; all new endpoints gated by existing auth deps

Known Threat Patterns

Pattern STRIDE Standard Mitigation
IDOR on PATCH /shares/{id} Elevation of Privilege share.owner_id == current_user.id → 404 on mismatch (T-04-04-02 pattern)
Path traversal in daily export date param Tampering re.fullmatch(r"\d{4}-\d{2}-\d{2}", date) before key construction
Token bypass via window.location.href Info Disclosure Replace with fetch() + Authorization header (D-13)
Cloud credential exposure in error response Info Disclosure Cloud delete exception caught generically; str(exc) must NOT include credential data — catch as Exception, log internally, return generic message
Quota underflow on cloud delete Tampering skip_quota=True for cloud documents; cloud docs never had quota charged
Mass assignment on SharePermissionPatch Tampering Minimal Pydantic model with explicit permission field only — no **kwargs passthrough

Sources

Primary (HIGH confidence)

  • [VERIFIED: codebase]backend/services/storage.py lines 143-179 — delete_document() implementation; quota decrement pattern; MinIO-only delete
  • [VERIFIED: codebase]backend/storage/__init__.pyget_storage_backend_for_document() factory; cloud credential routing
  • [VERIFIED: codebase]backend/api/admin.py lines 527-539 — canonical cloud delete pattern (per-doc) to follow
  • [VERIFIED: codebase]backend/api/shares.py — existing IDOR pattern; permission="view" hardcode at line 97; route ordering constraint
  • [VERIFIED: codebase]backend/api/audit.py_audit_to_dict() pure function; _build_filtered_query(); both endpoints
  • [VERIFIED: codebase]backend/tasks/audit_tasks.py — key pattern audit-logs/{date}.csv; put_object_raw usage
  • [VERIFIED: codebase]frontend/src/api/client.js lines 399-428 — fetchDocumentContent as fetch+Blob precedent
  • [VERIFIED: codebase]frontend/src/components/admin/AuditLogTab.vue lines 185-191 — broken window.location.href export
  • [VERIFIED: codebase]frontend/src/components/documents/DocumentCard.vue line 31 — share_count > 0 bug
  • [VERIFIED: npm registry] — Minio Python SDK list_objects signature confirmed at runtime; Object.object_name, Object.is_dir attributes verified

Secondary (MEDIUM confidence)

  • [ASSUMED] — SQLAlchemy 2.0 aliased double-JOIN returns Row tuples with positional access — documented in SA2.0 changelog but not verified via Context7 in this session

Metadata

Confidence breakdown:

  • Standard stack: HIGH — all packages already in use; no new dependencies
  • Architecture: HIGH — all patterns drawn from existing codebase code; no external documentation required
  • Pitfalls: HIGH — each pitfall is backed by specific code paths read directly from the source files
  • Test gaps: HIGH — based on reading existing test files and counting what is not yet covered

Research date: 2026-05-31 Valid until: 2026-07-01 (stable brownfield — no fast-moving dependencies)