Folders, Sharing, Quotas & Document UX — plans verified (0 blockers, 2 non-blocking warnings). Covers FOLD-01..05, SHARE-01..05, SEC-08/09, ADMIN-06, DOC-01/02. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
59 KiB
Phase 4: Folders, Sharing, Quotas & Document UX — Research
Researched: 2026-05-25 Domain: FastAPI folder/share CRUD, PostgreSQL tsvector, MinIO streaming proxy, Celery beat, Vue 3 folder navigation Confidence: HIGH
<user_constraints>
User Constraints (from CONTEXT.md)
Locked Decisions
D-01 Hybrid layout — AppSidebar shows top-level folders only. Sub-folders + breadcrumb in main content area. Top-level folders clickable directly in sidebar.
D-02 Unlimited nesting depth (no API or UI cap). Folder.parent_id self-referential FK is authoritative. Breadcrumbs truncate at depth > 4 (show first + "..." + last 2).
D-03 Non-empty folder delete: warning modal with document count. Confirm → cascade-delete all documents (MinIO + DB + quota). Cancel → no action. Documents are NOT moved to root — they are destroyed.
D-04 Exact handle input — no autocomplete. API returns 404 if handle not found; UI shows "User not found" error.
D-05 Share button on DocumentCard (inline icon button). Modal: (a) handle input, (b) Share button, (c) current recipients list with Revoke per row.
D-06 "Shared with me" is a fixed virtual folder entry in AppSidebar, rendered above the user's own folder list. Filtered by shares.recipient_id = current_user.id. Zero quota charged to recipient.
D-07 Share permission is view only for Phase 4. edit deferred.
D-08 Streaming proxy endpoint: GET /api/documents/{id}/content. Returns bytes via FastAPI StreamingResponse. Supports Range headers. Content-Disposition: inline. No presigned URL ever generated or exposed. Uses get_regular_user dep.
D-09 Native browser PDF rendering — no PDF.js. Zero frontend dependencies added. Content-Type header drives browser rendering.
D-10 users.pdf_open_mode column (String, default 'in_app'). Exposed via PATCH /api/me/preferences. in_app = modal with <iframe>; new_tab = window.open.
D-11 PostgreSQL tsvector GIN index on documents.extracted_text. plainto_tsquery('english', :q). Search scope: user's own docs only. GET /api/documents?q=<query>.
D-12 Inline search bar in document list view. Debounced, minimum 2 chars. Folder-context aware.
D-13 All 4 audit event categories logged: auth events, document operations, folder/share operations, admin actions.
D-14 Audit log written inline in handlers (no middleware, no Celery task). Shared helper write_audit_log(session, event_type, user_id, actor_id, resource_id, ip_address, metadata_dict).
D-15 Admin audit log viewer (/admin → AuditLog tab): paginated table, filters (date range, user, action type). No document content, filenames, or extracted text in any log entry.
D-16 Manual CSV and JSON export button in admin audit log view. Exports current filtered result set.
D-17 Daily automated CSV backup via Celery beat. Exports previous day's audit rows to MinIO audit-logs bucket (audit-logs/YYYY-MM-DD.csv). Bucket is private. Admins can download from admin panel. Rows NOT deleted from DB.
D-18 credentials_enc excluded from all serializers (SEC-08). Explicit Pydantic response model enforcement on all user-facing endpoints that touch CloudConnection.
D-19 Account deletion (admin-triggered) runs delete_user_files() before removing DB records (SEC-09). Cloud connection cleanup deferred to Phase 5.
Claude's Discretion
None specified — all significant decisions are locked.
Deferred Ideas (OUT OF SCOPE)
- Per-day document scan quota tiers (future billing milestone)
- Share permission levels beyond
view(edit,commentdeferred) - Presigned GET URLs for non-PDF document download
- Audit log row archival/deletion policy (DB rows are not auto-deleted) </user_constraints>
<phase_requirements>
Phase Requirements
| ID | Description | Research Support |
|---|---|---|
| FOLD-01 | User can create, rename, and delete folders (delete confirms content count before proceeding) | Folder model complete in db/models.py; cascade-delete pattern via SQLAlchemy + atomic quota decrement established in Phase 3 |
| FOLD-02 | User can move documents between folders | Document.folder_id FK exists and is nullable; PATCH endpoint updates folder_id with ownership assertion on both doc and target folder |
| FOLD-03 | Breadcrumb navigation renders current folder path; each segment clickable | Recursive path query via WITH RECURSIVE CTE or iterative parent-chain walk; Vue computed property for breadcrumb array |
| FOLD-04 | Document list supports sort by name, date uploaded, and file size | SQLAlchemy order_by() with enum sort param; frontend sort query param |
| FOLD-05 | Full-text search via PostgreSQL tsvector index on extracted text | Expression-index approach to_tsvector('english', extracted_text) + GIN; plainto_tsquery in WHERE clause; Alembic migration adds index manually (not as Computed column to avoid autogenerate noise) |
| SHARE-01 | User can share a document with another user by their unique handle | Share model complete; User.handle unique lookup; exact match, no autocomplete |
| SHARE-02 | Shared documents appear in "Shared with me" virtual folder; zero quota charged to recipient | Filter shares.recipient_id = current_user.id; no quota row touched; virtual folder is a UI/API filter, not a DB folder |
| SHARE-03 | Shared access view-only by default; owner controls permission level | Share.permission defaults to 'view'; Phase 4 only implements view; permission field returned in list responses |
| SHARE-04 | Owner can revoke share; revocation is immediate | DELETE /api/shares/{share_id} with owner assertion; no async cleanup needed |
| SHARE-05 | Documents shared with others display a "shared" indicator | Document list response includes is_shared: bool derived from EXISTS subquery on shares.owner_id |
| SEC-08 | credentials_enc excluded from all API serializers |
Explicit CloudConnectionOut Pydantic response model excluding credentials_enc |
| SEC-09 | Account deletion triggers delete_user_files() per cloud connection before DB removal |
Implemented in admin delete-user endpoint; iterates documents, deletes MinIO objects, decrements quota |
| ADMIN-06 | Admin can view audit log filtered by date range, user, and action type (metadata only) | AuditLog model complete; paginated query with filters; admin endpoint uses get_current_admin |
| DOC-01 | User can view document metadata and extracted text | Existing GET /api/documents/{id} already returns extracted_text; Phase 4 ensures it is included in response |
| DOC-02 | In-browser PDF preview via PDF.js proxy; no presigned URLs exposed | GET /api/documents/{id}/content — StreamingResponse with Range header support, bytes from MinIO via asyncio.to_thread |
| </phase_requirements> |
Summary
Phase 4 builds entirely on the existing ORM schema (all required models — Folder, Share, AuditLog, Document.folder_id, Share.recipient_id — are already in db/models.py). The work is predominantly plumbing: creating new API router modules for folders, shares, and audit log; adding a single Alembic migration for the users.pdf_open_mode column and the tsvector GIN expression index; extending the Celery beat schedule with a daily audit export task; and extending the Vue 3 frontend with folder navigation, sharing modals, and the settings preference toggle.
The highest-risk areas are: (1) the PDF streaming proxy with Range header support (needs careful byte-range parsing to handle PDF viewer seeking), (2) the tsvector GIN index and corresponding Alembic migration (Alembic autogenerate has known quirks with to_tsvector() expression indexes — the migration must be written manually), and (3) the folder cascade-delete with atomic quota decrement for multiple documents (each document's size_bytes must be summed and decremented in a single atomic UPDATE or via a pre-computed sum).
No new Python or npm packages are required for Phase 4. The decision to use native browser PDF rendering (D-09) eliminates the PDF.js dependency. All needed libraries — FastAPI, SQLAlchemy, MinIO SDK, Celery, Pydantic — are already pinned in requirements.txt. The frontend needs only Tailwind utility classes; no new npm packages.
Primary recommendation: Build in 6 plans: Wave 0 test scaffolds + migration; folders backend; shares backend + audit service; streaming proxy + search; audit log admin API + Celery export task; frontend folder navigation + sharing + settings.
Architectural Responsibility Map
| Capability | Primary Tier | Secondary Tier | Rationale |
|---|---|---|---|
| Folder CRUD + breadcrumb path | API / Backend | Database / Storage | Ownership assertion and cascade-delete logic belongs in the API layer; path reconstruction is a DB query |
| Document move between folders | API / Backend | Database / Storage | Ownership assertions on both document and target folder must be co-located in the API handler |
| Full-text search (tsvector) | Database / Storage | API / Backend | GIN index and plainto_tsquery execute in PostgreSQL; API adds scope filter (user_id) |
| Share grant / revoke | API / Backend | Database / Storage | IDOR prevention requires ownership assertion before DB write; share table is the single source of truth |
| "Shared with me" virtual folder | API / Backend | — | Pure query filter on shares.recipient_id; no DB folder row; quota enforcement is in the API layer |
| PDF streaming proxy | API / Backend | Database / Storage | Bytes flow from MinIO through FastAPI to the browser; presigned URLs never reach the browser |
| Per-user PDF open preference | API / Backend | Database / Storage | User preference column (users.pdf_open_mode) read/written via /api/me/preferences |
| Audit log write | API / Backend | — | write_audit_log() called inline in each handler after successful operation; no middleware |
| Audit log viewer + export | API / Backend | Database / Storage | Admin-only paginated query with filters; CSV export is a StreamingResponse from a filtered DB query |
| Daily audit backup (Celery beat) | API / Backend (worker) | Database / Storage | Celery beat task runs in the worker; queries DB and uploads CSV to MinIO audit-logs bucket |
| Folder navigation UI state | Frontend Server (SSR) → Browser / Client | — | Folder ID and breadcrumb path held in Vue ref() in HomeView.vue; no server-side state |
| Share modal + indicator | Browser / Client | — | Share button on DocumentCard; modal state is local component ref(); share list in documents store |
| PDF open preference toggle | Browser / Client | API / Backend | Toggle in SettingsView.vue reads/writes preference via PATCH /api/me/preferences |
credentials_enc exclusion (SEC-08) |
API / Backend | — | Pydantic response model CloudConnectionOut excludes the field; never reaches serializer |
Standard Stack
Core (all already in requirements.txt / package.json)
| Library | Version | Purpose | Why Standard |
|---|---|---|---|
| FastAPI | 0.136+ | API router for folders, shares, audit, proxy, preferences | Already in use; APIRouter + StreamingResponse needed for proxy |
| SQLAlchemy 2.0 async | 2.0+ | ORM queries for all new endpoints | Established pattern in project; async session already wired |
| Alembic | 1.18+ | Migration 0004 — users.pdf_open_mode column + tsvector GIN index |
Established migration toolchain |
| MinIO Python SDK | 7.2+ | get_object() for streaming proxy; put_object() for audit CSV upload |
Already in use; asyncio.to_thread() pattern established |
| Celery + Redis | 5.5+ | Daily audit export beat task | Already in use; beat_schedule in celery_app.py |
| Pydantic v2 | 2.0+ | CloudConnectionOut response model excluding credentials_enc (SEC-08) |
Already in use for all request/response models |
| Vue 3 (Options API) | 3.4+ | Folder nav, share modal, settings toggle | Already in use; ref() + watch() patterns established |
[VERIFIED: codebase — requirements.txt and package.json]
No New Packages Required
Phase 4 introduces zero new dependencies. Key rationale:
- PDF viewer: D-09 chose native browser rendering — no PDF.js
- Debounce: hand-rolled
setTimeout/clearTimeoutin Vuewatchcallback (< 10 lines, no lodash needed) - CSV export: Python stdlib
csv.DictWriterwithio.StringIO— no pandas - Search: PostgreSQL built-in
plainto_tsquery— no Elasticsearch or external search engine
[VERIFIED: codebase — no new deps required]
Package Legitimacy Audit
No new packages are installed in Phase 4. All runtime dependencies are carried forward from Phases 1–3 with pinned versions in requirements.txt.
Packages removed due to slopcheck [SLOP] verdict: none Packages flagged as suspicious [SUS]: none New packages to install: none
Architecture Patterns
System Architecture Diagram
Browser
│
├── GET /api/documents?q=search&folder_id=X&sort=name
│ FastAPI → SQLAlchemy → PostgreSQL (tsvector @@ plainto_tsquery, scoped to user)
│ ← JSON: [{id, filename, size_bytes, is_shared, folder_id, ...}]
│
├── POST /api/folders / PATCH /api/folders/{id} / DELETE /api/folders/{id}
│ FastAPI → ownership assertion → SQLAlchemy → PostgreSQL
│ DELETE: sum(doc.size_bytes) → atomic quota decrement → MinIO delete_object per doc
│
├── POST /api/shares / DELETE /api/shares/{id}
│ FastAPI → ownership assertion (owner_id = current_user.id) → SQLAlchemy → PostgreSQL
│ GET /api/shares/received → filter shares.recipient_id = current_user.id
│
├── GET /api/documents/{id}/content [Range: bytes=X-Y]
│ FastAPI (get_regular_user) → ownership OR share-access check → SQLAlchemy (object_key)
│ → MinIO get_object() via asyncio.to_thread() → StreamingResponse (206 or 200)
│ ← bytes with Content-Type, Content-Disposition: inline, Accept-Ranges: bytes
│ NOTE: no presigned URL generated; bytes flow through FastAPI
│
├── PATCH /api/me/preferences
│ FastAPI → SQLAlchemy UPDATE users.pdf_open_mode → ← {pdf_open_mode}
│
└── GET /api/admin/audit-log [?start=&end=&user_id=&event_type=&page=]
FastAPI (get_current_admin) → SQLAlchemy → PostgreSQL
← JSON: {items: [{id, event_type, user_id, actor_id, ip_address, metadata_, created_at}], total}
GET /api/admin/audit-log/export?format=csv|json
FastAPI (get_current_admin) → query → StreamingResponse (text/csv or application/json)
Celery Beat Worker (daily at midnight UTC)
│
└── audit_log_daily_export task
→ AsyncSessionLocal → SELECT audit_log WHERE date = yesterday
→ csv.DictWriter → io.BytesIO
→ MinIO put_object("audit-logs", "YYYY-MM-DD.csv", bytes)
Recommended Project Structure (new files only)
backend/
├── api/
│ ├── folders.py # FOLD-01, FOLD-02, FOLD-03, FOLD-04, FOLD-05
│ ├── shares.py # SHARE-01..05
│ └── audit.py # ADMIN-06 (admin viewer + export)
├── services/
│ └── audit.py # write_audit_log() helper
├── tasks/
│ └── audit_tasks.py # audit_log_daily_export Celery beat task
└── migrations/versions/
└── 0004_phase4_pdf_open_mode_tsvector.py
frontend/src/
├── components/
│ ├── documents/
│ │ └── ShareModal.vue # D-05
│ ├── layout/
│ │ └── BreadcrumbNav.vue # FOLD-03
│ └── admin/
│ └── AdminAuditLogTab.vue # ADMIN-06 (D-15, D-16)
├── stores/
│ └── folders.js # folder state + actions
└── views/
└── SettingsView.vue # extend with PDF preference toggle (D-10)
Pattern 1: Ownership Assertion (established in Phase 3 — D-16)
What: Every resource endpoint asserts resource.user_id == current_user.id. Cross-user access returns 404 (not 403) to prevent attacker enumeration of resource IDs.
When to use: All folder CRUD, document move, share grant/revoke (owner side), streaming proxy.
# Source: backend/api/documents.py (established pattern)
doc = await session.get(Document, uid)
if doc is None or doc.user_id != current_user.id:
raise HTTPException(status_code=404, detail="Document not found")
Folder variant:
folder = await session.get(Folder, folder_id)
if folder is None or folder.user_id != current_user.id:
raise HTTPException(status_code=404, detail="Folder not found")
[VERIFIED: codebase — backend/api/documents.py]
Pattern 2: Atomic Quota Decrement for Folder Cascade-Delete
What: When deleting a folder, collect all document size_bytes first, then issue a single atomic quota UPDATE. Never read-then-write in Python.
When to use: DELETE /api/folders/{id} when folder is non-empty.
# Source: established atomic pattern (CLAUDE.md + Phase 3 D-07)
# Step 1: collect document IDs and total size within the folder subtree
result = await session.execute(
select(Document.id, Document.size_bytes, Document.object_key)
.where(Document.folder_id == folder_id, Document.user_id == current_user.id)
)
docs = result.all()
total_bytes = sum(row.size_bytes for row in docs)
# Step 2: atomic quota decrement (CASE WHEN pattern — SQLite compatible)
await session.execute(
text(
"UPDATE quotas SET used_bytes = "
"CASE WHEN used_bytes > :delta THEN used_bytes - :delta ELSE 0 END "
"WHERE user_id = :uid"
),
{"delta": total_bytes, "uid": str(current_user.id)},
)
# Step 3: delete MinIO objects (best-effort)
for row in docs:
try:
await get_storage_backend().delete_object(row.object_key)
except Exception:
pass # MinIO cleanup is best-effort
# Step 4: delete documents (cascade via FK) then folder
await session.execute(
delete(Document).where(Document.folder_id == folder_id)
)
await session.delete(folder)
await session.commit()
[ASSUMED: the CASE WHEN pattern for SQLite compat is established in STATE.md; verified in existing delete_document service function]
Pattern 3: PDF Streaming Proxy with Range Headers
What: GET /api/documents/{id}/content fetches bytes from MinIO and streams them with Range header support. Status 206 for range requests; 200 for full fetches.
When to use: DOC-02. Required for PDF viewer to seek pages without re-downloading the entire file.
# Source: FastAPI discussion #7718 (CITED: github.com/fastapi/fastapi/discussions/7718)
from fastapi import Request
from fastapi.responses import StreamingResponse
def _parse_range(range_header: str, file_size: int) -> tuple[int, int]:
"""Parse 'bytes=start-end' Range header. Raises 416 on invalid range."""
try:
h = range_header.replace("bytes=", "").split("-")
start = int(h[0]) if h[0] != "" else 0
end = int(h[1]) if h[1] != "" else file_size - 1
except (ValueError, IndexError):
raise HTTPException(status.HTTP_416_REQUESTED_RANGE_NOT_SATISFIABLE)
if start > end or start < 0 or end >= file_size:
raise HTTPException(status.HTTP_416_REQUESTED_RANGE_NOT_SATISFIABLE)
return start, end
@router.get("/{doc_id}/content")
async def stream_document_content(
doc_id: str,
request: Request,
session: AsyncSession = Depends(get_db),
current_user: User = Depends(get_regular_user),
):
# Ownership assertion (DOC-02 also allows shared-with-me access — see Pattern 4)
doc = await session.get(Document, uid)
if doc is None or doc.user_id != current_user.id:
# Also allow if document is shared with current_user
share = await _get_share_for_user(session, uid, current_user.id)
if share is None:
raise HTTPException(404, "Document not found")
# Use the document from the share's owner context
# (doc is already loaded; ownership check skipped for shared access)
# Fetch all bytes from MinIO (asyncio.to_thread wraps sync SDK)
file_bytes = await get_storage_backend().get_object(doc.object_key)
file_size = len(file_bytes)
range_header = request.headers.get("range")
headers = {
"content-type": doc.content_type,
"content-disposition": f"inline; filename=\"{doc.filename}\"",
"accept-ranges": "bytes",
"content-length": str(file_size),
}
if range_header:
start, end = _parse_range(range_header, file_size)
chunk = file_bytes[start:end + 1]
headers["content-range"] = f"bytes {start}-{end}/{file_size}"
headers["content-length"] = str(len(chunk))
return StreamingResponse(
iter([chunk]),
status_code=206,
headers=headers,
)
return StreamingResponse(
iter([file_bytes]),
status_code=200,
headers=headers,
)
Critical: get_regular_user dep ensures admins cannot access document content (DOC-02 / CLAUDE.md). No presigned URL is generated at any point.
[CITED: https://github.com/fastapi/fastapi/discussions/7718] [CITED: https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/206]
Pattern 4: Shared-Document Access for Streaming Proxy
What: The streaming proxy must allow both document owners AND recipients to access content.
When to use: Only in GET /api/documents/{id}/content — sharing grants read access to the content stream.
# Source: CONTEXT.md D-08 — recipient can preview shared document
async def _can_access_document(
session: AsyncSession, doc: Document, current_user_id: uuid.UUID
) -> bool:
"""True if user owns the document OR has an active share as recipient."""
if doc.user_id == current_user_id:
return True
result = await session.execute(
select(Share).where(
Share.document_id == doc.id,
Share.recipient_id == current_user_id,
)
)
return result.scalar_one_or_none() is not None
[ASSUMED: the share-based access for the streaming proxy is implied by D-08 and SHARE-02; not explicitly called out in a prior phase decision]
Pattern 5: PostgreSQL tsvector Full-Text Search
What: GIN expression index on documents.extracted_text for full-text search. Query uses plainto_tsquery (accepts natural language; no operator syntax required from user).
When to use: FOLD-05 — GET /api/documents?q=<query>.
Migration (manual — do NOT use Computed() to avoid Alembic autogenerate noise):
# Source: CITED: https://www.postgresql.org/docs/current/textsearch-tables.html
# In migration 0004:
op.execute(
"CREATE INDEX ix_documents_fts ON documents "
"USING GIN (to_tsvector('english', coalesce(extracted_text, '')))"
)
Query pattern:
# Source: CITED: https://docs.sqlalchemy.org/en/20/dialects/postgresql.html
from sqlalchemy import func, text
stmt = (
select(Document)
.where(
Document.user_id == current_user.id,
func.to_tsvector("english", func.coalesce(Document.extracted_text, "")).op("@@")(
func.plainto_tsquery("english", q)
),
)
.order_by(Document.created_at.desc())
)
Why plainto_tsquery not to_tsquery: plainto_tsquery accepts unstructured natural language ("invoice march 2024") without requiring the user to supply & and | operators. Simpler and safer for a search bar input.
Why expression index not generated/stored column: PostgreSQL generated columns with TSVECTOR type require PostgreSQL 12+ and add schema complexity. The expression index approach is simpler and Alembic-compatible (written as raw SQL in the migration; autogenerate will not re-detect it if managed manually).
Important Alembic caveat: include_schemas=True autogenerate WILL repeatedly flag the expression index for recreation (known Alembic issue: github.com/sqlalchemy/alembic/issues/1390). The migration MUST be written manually and committed once. Do not run alembic revision --autogenerate after adding the GIN index without reviewing the output.
[CITED: https://www.postgresql.org/docs/current/textsearch-tables.html] [CITED: https://docs.sqlalchemy.org/en/20/dialects/postgresql.html] [CITED: https://github.com/sqlalchemy/alembic/issues/1390]
Pattern 6: write_audit_log() Helper
What: Shared service function called inline in each handler after a successful operation. Does NOT raise on failure (audit log writing is best-effort — do not fail the primary operation).
When to use: Every handler in Phase 4 that performs state-changing operations; plus back-filling into auth.py and admin.py.
# Source: CONTEXT.md D-14
# backend/services/audit.py
from __future__ import annotations
import uuid
from typing import Optional
from sqlalchemy.ext.asyncio import AsyncSession
from db.models import AuditLog
import logging
logger = logging.getLogger(__name__)
async def write_audit_log(
session: AsyncSession,
event_type: str,
user_id: Optional[uuid.UUID],
actor_id: Optional[uuid.UUID],
resource_id: Optional[uuid.UUID],
ip_address: Optional[str],
metadata_: Optional[dict] = None,
) -> None:
"""Write an audit log entry. Never raises — audit failure is non-fatal."""
try:
entry = AuditLog(
event_type=event_type,
user_id=user_id,
actor_id=actor_id,
resource_id=resource_id,
ip_address=ip_address,
metadata_=metadata_,
)
session.add(entry)
await session.flush() # flush within the handler's existing transaction
except Exception as exc:
logger.warning("audit log write failed: %s", exc)
# Do not re-raise — audit failure must never abort the primary operation
Critical: session.flush() not session.commit() — audit log writes within the same transaction as the primary operation, so the primary operation's session.commit() commits the audit entry too. This avoids partial writes where the operation succeeds but the audit entry is orphaned.
[ASSUMED: flush-not-commit pattern is the safest approach given transactional semantics; not explicitly documented in a prior phase decision]
Pattern 7: CSV Export (admin endpoint)
What: GET /api/admin/audit-log/export?format=csv returns a StreamingResponse with text/csv content type and Content-Disposition: attachment.
# Source: Python stdlib csv module
import csv
import io
from fastapi.responses import StreamingResponse
async def export_audit_log(
session: AsyncSession,
start: Optional[datetime],
end: Optional[datetime],
user_id: Optional[uuid.UUID],
event_type: Optional[str],
format: str = "csv",
):
rows = await _query_audit_log(session, start, end, user_id, event_type)
output = io.StringIO()
writer = csv.DictWriter(output, fieldnames=[
"id", "event_type", "user_id", "actor_id",
"resource_id", "ip_address", "metadata_", "created_at"
])
writer.writeheader()
for row in rows:
writer.writerow({
"id": row.id,
"event_type": row.event_type,
"user_id": str(row.user_id) if row.user_id else "",
"actor_id": str(row.actor_id) if row.actor_id else "",
"resource_id": str(row.resource_id) if row.resource_id else "",
"ip_address": str(row.ip_address) if row.ip_address else "",
"metadata_": str(row.metadata_) if row.metadata_ else "",
"created_at": row.created_at.isoformat(),
})
output.seek(0)
return StreamingResponse(
iter([output.getvalue()]),
media_type="text/csv",
headers={"Content-Disposition": f"attachment; filename=audit-export.csv"},
)
[CITED: Python stdlib csv documentation — https://docs.python.org/3/library/csv.html]
Pattern 8: Celery Beat — Daily Audit Export Task
What: A new task audit_log_daily_export added to beat_schedule. Uses crontab(hour=0, minute=0) for midnight UTC.
# Source: CITED: https://docs.celeryq.dev/en/stable/userguide/periodic-tasks.html
from celery.schedules import crontab
# In celery_app.py beat_schedule dict — ADD to existing schedule:
"audit-log-daily-export": {
"task": "tasks.audit_tasks.audit_log_daily_export",
"schedule": crontab(hour=0, minute=0), # midnight UTC
},
Task implementation pattern (mirrors cleanup_abandoned_uploads):
@celery_app.task(name="tasks.audit_tasks.audit_log_daily_export")
def audit_log_daily_export() -> dict:
return asyncio.run(_run_daily_export())
async def _run_daily_export() -> dict:
from datetime import date, timedelta
import csv, io
from db.session import AsyncSessionLocal
from db.models import AuditLog
from storage import get_storage_backend
from sqlalchemy import select
yesterday = date.today() - timedelta(days=1)
start = datetime(yesterday.year, yesterday.month, yesterday.day, tzinfo=timezone.utc)
end = start + timedelta(days=1)
async with AsyncSessionLocal() as session:
result = await session.execute(
select(AuditLog)
.where(AuditLog.created_at >= start, AuditLog.created_at < end)
.order_by(AuditLog.created_at)
)
rows = result.scalars().all()
output = io.StringIO()
writer = csv.DictWriter(output, fieldnames=[...])
writer.writeheader()
for row in rows:
writer.writerow({...})
csv_bytes = output.getvalue().encode("utf-8")
key = f"audit-logs/{yesterday.isoformat()}.csv"
backend = get_storage_backend()
await backend.put_object_raw(
bucket="audit-logs",
key=key,
data=io.BytesIO(csv_bytes),
length=len(csv_bytes),
content_type="text/csv",
)
return {"exported": len(rows), "key": key}
Note: MinIOBackend may need a put_object_raw(bucket, key, data, length, content_type) method (vs the existing put_object which constructs the key schema). The audit-logs bucket uses a different key scheme. Add this method in the migration plan.
[CITED: https://docs.celeryq.dev/en/stable/userguide/periodic-tasks.html]
Pattern 9: Pydantic Response Model Exclusion (SEC-08)
What: Explicit response_model=CloudConnectionOut where CloudConnectionOut omits credentials_enc. This is the safe-by-default Pydantic v2 approach.
# Source: CITED: https://fastapi.tiangolo.com/tutorial/response-model/
from pydantic import BaseModel
from datetime import datetime
class CloudConnectionOut(BaseModel):
id: str
provider: str
display_name: str
status: str
connected_at: datetime
# credentials_enc is deliberately absent — never serialized (SEC-08)
model_config = {"from_attributes": True}
Phase 4 scope: No cloud connection endpoints exist yet (Phase 5). Phase 4's obligation is to ensure the CloudConnectionOut model is defined and used as response_model on any admin endpoint that touches cloud_connections. Since no cloud connection endpoints exist yet, Phase 4 creates the model in schemas/cloud.py (or inline in api/admin.py) so Phase 5 cannot accidentally expose the field.
[CITED: https://fastapi.tiangolo.com/tutorial/response-model/]
Pattern 10: Vue 3 Debounced Search (no external dependency)
What: Watch the search query ref with a manual debounce using setTimeout/clearTimeout. Does not require lodash or VueUse.
// Source: Vue 3 Composition API watch pattern (Options API compatible via watch option)
// frontend/src/stores/documents.js
import { ref, watch } from 'vue'
const searchQuery = ref('')
let _searchTimer = null
watch(searchQuery, (newVal) => {
clearTimeout(_searchTimer)
if (newVal.length < 2) {
// Clear results or show all
return
}
_searchTimer = setTimeout(() => {
fetchDocuments({ q: newVal, folderId: currentFolderId.value })
}, 300)
})
[ASSUMED: 300ms is conventional for search debounce; no official Vue doc specifies this value]
Anti-Patterns to Avoid
- Using
session.commit()insidewrite_audit_log(): Must useflush()to stay within the handler's transaction. A separate commit risks committing the audit entry without the primary operation. - Generating presigned URLs in the streaming proxy: D-08 explicitly prohibits this. Never call
presigned_get_url()from the proxy endpoint. The bytes must flow through FastAPI. to_tsqueryinstead ofplainto_tsquery:to_tsqueryrequires the user to supply&,|, and!operators. Search bar input usesplainto_tsquerywhich accepts natural language phrases.- Using
response_model_excludeon CloudConnection: Prefer defining an explicitCloudConnectionOutmodel that never includescredentials_enc.response_model_excludeis fragile (requires remembering to pass the set on every endpoint). Whitelist is safer. - Storing breadcrumb state in the URL query string: Breadcrumb is derived from the current
folder_idparameter — only the current folder ID needs to be in the URL (or in Vue state). The full path is reconstructed from the DB on navigation. - Recursive SQL for breadcrumb without depth limit:
WITH RECURSIVECTE is correct but should include a depth guard (WHERE depth < 50) to prevent infinite recursion on a corrupted self-referential FK.
Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---|---|---|---|
| Range header parsing | Custom byte-range logic from scratch | Pattern 3 above (from FastAPI discussion #7718) | Browser PDFs use RFC 7233 range requests; edge cases (open-ended ranges bytes=500-) must be handled |
| Full-text search ranking | BM25 in Python | PostgreSQL ts_rank() + plainto_tsquery |
Runs in the DB; GIN index makes it fast; no external service needed |
| CSV serialization | String concatenation with escaping | Python stdlib csv.DictWriter |
Handles quoting, escaping, newlines in field values automatically |
| Audit log middleware | Celery task or FastAPI middleware for audit writes | Inline write_audit_log() after successful operation |
Middleware fires on request start — cannot know if operation succeeded; Celery task adds latency and potential missed writes on worker failure |
| Tsvector maintenance | Application-layer extracted_text → tsvector sync |
PostgreSQL GIN expression index to_tsvector('english', extracted_text) |
DB recalculates automatically on UPDATE; no trigger or application sync needed |
| Debounce utility | npm package (lodash, VueUse) | setTimeout/clearTimeout in watch |
< 10 lines; no new dependency justified |
Key insight: PostgreSQL handles full-text search natively with GIN indexes at query time — no external search infrastructure (Elasticsearch, Typesense, Meilisearch) is needed or justified at this scale.
Common Pitfalls
Pitfall 1: Folder Cascade-Delete Misses Sub-folder Documents
What goes wrong: DELETE /api/folders/{id} deletes direct child documents but leaves orphaned documents in sub-folders. Quota becomes incorrect.
Why it happens: A naive WHERE folder_id = :id only catches immediate children. Self-referential FK with unlimited nesting requires a recursive query to collect the full subtree.
How to avoid: Use WITH RECURSIVE CTE to collect all folder IDs in the subtree before deleting:
WITH RECURSIVE subtree AS (
SELECT id FROM folders WHERE id = :root_id AND user_id = :uid
UNION ALL
SELECT f.id FROM folders f
JOIN subtree s ON f.parent_id = s.id
WHERE f.user_id = :uid AND depth < 50
)
SELECT d.id, d.size_bytes, d.object_key
FROM documents d
JOIN subtree s ON d.folder_id = s.id
WHERE d.user_id = :uid
Warning signs: After deleting a folder tree, quotas.used_bytes does not match SUM(documents.size_bytes).
[ASSUMED: SQLite does not support WITH RECURSIVE for test runs — integration tests requiring this pattern must use PostgreSQL (INTEGRATION=1)]
Pitfall 2: Alembic Autogenerate Repeatedly Detects tsvector GIN Index
What goes wrong: After adding the GIN expression index in migration 0004, running alembic revision --autogenerate again generates a migration that drops and recreates the same index.
Why it happens: Alembic's autogenerate does not understand functional/expression indexes on PostgreSQL when the expression involves function calls like to_tsvector(). Known issue in Alembic.
How to avoid: Write the GIN index migration manually (as raw op.execute(CREATE INDEX...)) and do NOT add the index to the ORM model's __table_args__. Keep it only in the migration. Document with a comment # managed manually — do not autogenerate.
Warning signs: Repeated alembic revision --autogenerate output shows DROP INDEX / CREATE INDEX for ix_documents_fts on every run.
[CITED: https://github.com/sqlalchemy/alembic/issues/1390]
Pitfall 3: PDF Streaming Proxy — Admin Access
What goes wrong: Admin user calls GET /api/documents/{id}/content and receives document bytes.
Why it happens: Using get_current_user instead of get_regular_user on the proxy endpoint.
How to avoid: The proxy endpoint MUST use get_regular_user dep, which raises HTTP 403 for user.role == 'admin'. This is established in Phase 3 (D-16) and must be applied to the proxy endpoint.
Warning signs: Security test test_admin_cannot_access_document_content fails or is absent.
[VERIFIED: codebase — backend/deps/auth.py get_regular_user raises 403 for admin role]
Pitfall 4: Share IDOR — Recipient Accesses Another Recipient's Share
What goes wrong: User A shares document with User B. User C (also a recipient of a different share on the same document, or just any user) can enumerate share IDs and revoke or view shares they don't own.
Why it happens: DELETE /api/shares/{share_id} checks only that the share exists, not that share.owner_id == current_user.id.
How to avoid: Ownership assertion on all share mutation endpoints:
share = await session.get(Share, share_id)
if share is None or share.owner_id != current_user.id:
raise HTTPException(404, "Share not found")
Warning signs: Security test test_share_revoke_wrong_owner_returns_404 is absent.
[ASSUMED: standard IDOR prevention pattern per CLAUDE.md; test must be written]
Pitfall 5: write_audit_log Does Not Include IP Address
What goes wrong: ip_address column is left NULL for all events because the handler does not extract it from the request.
Why it happens: The request.client.host extraction is not wired into write_audit_log() calls.
How to avoid: Inject request: Request into each handler and pass request.client.host as ip_address to write_audit_log(). For handlers behind a reverse proxy, use request.headers.get("X-Forwarded-For", request.client.host).
[ASSUMED: IP extraction from FastAPI Request is standard; X-Forwarded-For handling depends on deployment]
Pitfall 6: Folder Name Uniqueness Constraint Violation
What goes wrong: Creating two folders with the same name under the same parent raises an IntegrityError from the uq_folders_user_parent_name constraint, which surfaces as HTTP 500.
Why it happens: The Folder model has UniqueConstraint("user_id", "parent_id", "name"). The API handler does not check for duplicates before inserting.
How to avoid: Catch IntegrityError from the ORM insert and return HTTP 409 Conflict:
from sqlalchemy.exc import IntegrityError
try:
session.add(folder)
await session.commit()
except IntegrityError:
await session.rollback()
raise HTTPException(409, "A folder with that name already exists here")
Warning signs: POST /api/folders returns 500 on duplicate name.
[VERIFIED: codebase — db/models.py Folder UniqueConstraint("user_id", "parent_id", "name")]
Pitfall 7: "Shared with Me" Leaks Document Content
What goes wrong: The GET /api/documents list endpoint accidentally returns documents shared with the current user (not just owned by them), exposing filenames and extracted text to recipients.
Why it happens: A join on the shares table is added to the list query without scoping by ownership.
How to avoid: The standard list endpoint (GET /api/documents) MUST filter WHERE documents.user_id = current_user.id only. Shared documents are a separate endpoint GET /api/shares/received. Extracted text is not returned in the shared documents list view — only metadata.
[VERIFIED: codebase — services/storage.py list_metadata already scopes by user_id]
Pitfall 8: MinIO Audit-Logs Bucket Does Not Exist
What goes wrong: The daily Celery beat task fails with NoSuchBucket error on first run.
Why it happens: The audit-logs bucket is not created in the migration or application lifespan.
How to avoid: Create the audit-logs bucket in migration 0004's upgrade() function (same pattern as the main documents bucket, gated on MINIO_ENDPOINT env var to allow SQLite test runs):
# In migration 0004 post-DDL step
minio_endpoint = os.environ.get("MINIO_ENDPOINT")
if minio_endpoint:
client = Minio(minio_endpoint, ...)
if not client.bucket_exists("audit-logs"):
client.make_bucket("audit-logs")
[VERIFIED: codebase — migration 0003 shows the MINIO_ENDPOINT-gated pattern; audit-logs bucket creation follows same approach]
Runtime State Inventory
Phase 4 is a feature addition (not a rename/refactor). No runtime state migration is required.
| Category | Items Found | Action Required |
|---|---|---|
| Stored data | No existing pdf_open_mode values; column is NEW (default 'in_app') |
Alembic migration 0004 adds column with server_default — no data migration |
| Stored data | Existing audit_log table is EMPTY (no events written in prior phases) |
No migration; Phase 4 backfills audit writes into handlers going forward |
| Live service config | audit-logs MinIO bucket does not exist |
Created in migration 0004 post-DDL step (gated on MINIO_ENDPOINT) |
| OS-registered state | None — verified by codebase scan | None |
| Secrets/env vars | No new env vars required for Phase 4 features | None |
| Build artifacts | No stale egg-info or compiled artifacts from prior phases | None |
Environment Availability
| Dependency | Required By | Available | Version | Fallback |
|---|---|---|---|---|
| PostgreSQL | tsvector GIN index, CTE folder cascade | [runtime — Docker Compose] | 15+ | SQLite for unit tests (no tsvector; integration tests require INTEGRATION=1) |
| MinIO | Streaming proxy, audit CSV upload | [runtime — Docker Compose] | latest | Mock in tests (established pattern) |
| Redis | Celery beat daily export task | [runtime — Docker Compose] | latest | Not mockable for beat tasks; integration tests skip beat scheduling |
| Python csv (stdlib) | CSV export endpoint + Celery export task | ✓ | stdlib | — |
Missing dependencies with no fallback:
- PostgreSQL tsvector: cannot be tested without a real PostgreSQL instance. Tests that exercise FTS must be marked with
pytest.mark.skipif(not live_services_available, ...)or use theINTEGRATION=1pattern.
Missing dependencies with fallback:
- SQLite in unit tests: most folder/share CRUD tests run fine on SQLite. Only FTS and
WITH RECURSIVECTE tests require PostgreSQL.
Validation Architecture
Test Framework
| Property | Value |
|---|---|
| Framework | pytest + pytest-asyncio (already configured) |
| Config file | pytest.ini or pyproject.toml (check existing) |
| Quick run command | pytest backend/tests/test_folders.py backend/tests/test_shares.py -x |
| Full suite command | cd backend && pytest -v |
Phase Requirements → Test Map
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|---|---|---|---|---|
| FOLD-01 | Create/rename/delete folder | Integration | pytest backend/tests/test_folders.py::test_create_folder -x |
❌ Wave 0 |
| FOLD-01 | Delete non-empty folder → warning count | Integration | pytest backend/tests/test_folders.py::test_delete_folder_content_count -x |
❌ Wave 0 |
| FOLD-02 | Move document to folder | Integration | pytest backend/tests/test_folders.py::test_move_document -x |
❌ Wave 0 |
| FOLD-02 | Move document — ownership assertion | Integration (negative) | pytest backend/tests/test_folders.py::test_move_wrong_owner_404 -x |
❌ Wave 0 |
| FOLD-03 | Breadcrumb path returned in folder response | Unit | pytest backend/tests/test_folders.py::test_breadcrumb_path -x |
❌ Wave 0 |
| FOLD-04 | Sort by name/date/size | Integration | pytest backend/tests/test_folders.py::test_document_sort -x |
❌ Wave 0 |
| FOLD-05 | tsvector search returns matching docs | Integration (PostgreSQL required) | pytest backend/tests/test_folders.py::test_fts_search -x -m integration |
❌ Wave 0 |
| SHARE-01 | Share by handle — handle not found → 404 | Integration | pytest backend/tests/test_shares.py::test_share_handle_not_found -x |
❌ Wave 0 |
| SHARE-01 | Share by handle — success | Integration | pytest backend/tests/test_shares.py::test_share_success -x |
❌ Wave 0 |
| SHARE-02 | Shared doc in recipient virtual folder | Integration | pytest backend/tests/test_shares.py::test_shared_with_me -x |
❌ Wave 0 |
| SHARE-02 | Shared doc — zero quota charged to recipient | Integration | pytest backend/tests/test_shares.py::test_share_no_quota_impact -x |
❌ Wave 0 |
| SHARE-04 | Revoke share — immediate for recipient | Integration | pytest backend/tests/test_shares.py::test_revoke_share -x |
❌ Wave 0 |
| SHARE-01..04 | Share IDOR — wrong owner cannot revoke | Security (negative) | pytest backend/tests/test_shares.py::test_share_revoke_wrong_owner_404 -x |
❌ Wave 0 |
| SEC-08 | credentials_enc absent from all responses | Security (negative) | pytest backend/tests/test_security.py::test_credentials_enc_not_in_response -x |
❌ Wave 0 |
| SEC-09 | Account deletion deletes user files | Integration | pytest backend/tests/test_admin_api.py::test_delete_user_cleans_files -x |
❌ Wave 0 |
| DOC-02 | PDF proxy returns bytes, 200 or 206 | Integration | pytest backend/tests/test_documents.py::test_content_stream_200 -x |
❌ Wave 0 |
| DOC-02 | PDF proxy — no presigned URL in response | Security (negative) | pytest backend/tests/test_documents.py::test_content_stream_no_presigned_url -x |
❌ Wave 0 |
| DOC-02 | PDF proxy — Range header → 206 | Integration | pytest backend/tests/test_documents.py::test_content_stream_206_range -x |
❌ Wave 0 |
| DOC-02 | PDF proxy — admin blocked (403) | Security (negative) | pytest backend/tests/test_documents.py::test_content_stream_admin_403 -x |
❌ Wave 0 |
| ADMIN-06 | Audit log viewer — no document content in entries | Security (negative) | pytest backend/tests/test_audit.py::test_audit_log_no_doc_content -x |
❌ Wave 0 |
| ADMIN-06 | Audit log viewer — admin only | Security (negative) | pytest backend/tests/test_audit.py::test_audit_log_regular_user_403 -x |
❌ Wave 0 |
Sampling Rate
- Per task commit:
pytest backend/tests/test_folders.py backend/tests/test_shares.py backend/tests/test_audit.py backend/tests/test_documents.py -x - Per wave merge:
cd backend && pytest -v - Phase gate: Full suite green before
/gsd:verify-work
Wave 0 Gaps
backend/tests/test_folders.py— covers FOLD-01..05backend/tests/test_shares.py— covers SHARE-01..05 + IDOR security testsbackend/tests/test_audit.py— covers ADMIN-06 + no-doc-content security testsbackend/tests/test_documents.py— add proxy tests (test_content_stream_*) to existing filebackend/tests/test_security.py— add SEC-08, SEC-09 tests (may be in test_admin_api.py)
Security Domain
Applicable ASVS Categories
| ASVS Category | Applies | Standard Control |
|---|---|---|
| V2 Authentication | yes | get_regular_user / get_current_admin deps on all new endpoints |
| V3 Session Management | no | No new session mechanisms introduced |
| V4 Access Control | yes | Ownership assertion on all folder/share endpoints; get_regular_user rejects admin on proxy |
| V5 Input Validation | yes | Pydantic models on all request bodies; doc_id parsed via uuid.UUID() |
| V6 Cryptography | no | No new cryptographic operations; credentials still encrypted via Phase 2 HKDF pattern |
Known Threat Patterns for this Phase
| Pattern | STRIDE | Standard Mitigation |
|---|---|---|
| Share IDOR — revoke another user's share | Elevation of privilege | share.owner_id == current_user.id → 404 on mismatch |
| Share IDOR — access shared doc content without share | Information disclosure | _can_access_document() check in proxy endpoint: owner OR active share |
| PDF proxy leaking presigned URL | Information disclosure | get_object() fetches bytes directly; presigned URL never generated in proxy handler |
| Admin accessing document content via proxy | Broken access control | get_regular_user dep raises 403 for admin role |
| Folder IDOR — delete another user's folder | Tampering | folder.user_id == current_user.id → 404 on mismatch |
| Audit log containing document content | Sensitive data exposure | write_audit_log() metadata_ MUST NOT include filename, extracted_text, or file bytes |
| Audit log admin access by regular user | Broken access control | GET /api/admin/audit-log uses get_current_admin dep |
Path traversal via folder_id parameter |
Information disclosure | All folder/document lookups via DB primary key; uuid.UUID() parse validates format |
| Tsvector search scope leak — user sees others' docs | Information disclosure | FTS query MUST include Document.user_id == current_user.id scope filter |
| credentials_enc in serialized admin response | Sensitive data exposure | CloudConnectionOut excludes field; never in any response |
Security Gate Checklist (Phase 4 specific)
bandit -r backend/— zero HIGH findings in new files (api/folders.py,api/shares.py,api/audit.py,services/audit.py,tasks/audit_tasks.py)pip audit— zero new critical/high CVEs (no new packages)npm audit --audit-level=high— zero (no new npm packages)test_share_revoke_wrong_owner_404passestest_content_stream_admin_403passestest_content_stream_no_presigned_urlpassestest_audit_log_no_doc_contentpasses — verifiesfilename,extracted_textabsent from all audit entriestest_credentials_enc_not_in_responsepassestest_fts_search_scoped_to_ownerpasses — other user's docs not in search results
Code Examples
Folder Breadcrumb Path Query (PostgreSQL CTE)
-- Source: CITED: https://www.postgresql.org/docs/current/queries-with.html
WITH RECURSIVE breadcrumb AS (
SELECT id, parent_id, name, 1 AS depth
FROM folders
WHERE id = :folder_id AND user_id = :uid
UNION ALL
SELECT f.id, f.parent_id, f.name, b.depth + 1
FROM folders f
JOIN breadcrumb b ON f.id = b.parent_id
WHERE f.user_id = :uid AND b.depth < 50
)
SELECT id, name, depth FROM breadcrumb ORDER BY depth DESC;
Returns path from root to current folder (root = highest depth value after ORDER BY depth DESC). Each segment is {id, name} for clickable navigation.
Full-text Search Query (SQLAlchemy)
# Source: CITED: https://docs.sqlalchemy.org/en/20/dialects/postgresql.html
from sqlalchemy import func
from sqlalchemy.dialects.postgresql import TSVECTOR
stmt = (
select(Document)
.where(
Document.user_id == current_user.id,
func.to_tsvector("english", func.coalesce(Document.extracted_text, "")).op("@@")(
func.plainto_tsquery("english", q)
),
)
.order_by(Document.created_at.desc())
.limit(per_page)
.offset((page - 1) * per_page)
)
"Shared with me" Virtual Folder Query
# Source: CONTEXT.md D-06
from sqlalchemy import select
from db.models import Share, Document
stmt = (
select(Document)
.join(Share, Share.document_id == Document.id)
.where(Share.recipient_id == current_user.id)
.order_by(Document.created_at.desc())
)
result = await session.execute(stmt)
shared_docs = result.scalars().all()
is_shared indicator on document list
# Source: CONTEXT.md SHARE-05 — owner's list shows "shared" badge
from sqlalchemy import exists, select
# Add to list_documents query:
shared_subq = (
select(Share.document_id)
.where(Share.owner_id == current_user.id)
.scalar_subquery()
)
# Add to Document select:
stmt = select(Document, Document.id.in_(shared_subq).label("is_shared"))
State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|---|---|---|---|
PostgreSQL to_tsquery (operator syntax required) |
plainto_tsquery (natural language input) |
PostgreSQL 9.3+ | Search bars should use plainto_tsquery; to_tsquery for programmatic use only |
| PDF.js for in-browser PDF viewing | Native browser PDF viewer via Content-Type: application/pdf + Content-Disposition: inline |
Browser support ~2018+ | All modern browsers support native PDF rendering; PDF.js only needed for custom annotation features |
| Alembic autogenerate for GIN expression indexes | Manual migration SQL | Alembic 1.13+ (known bug) | Must write op.execute("CREATE INDEX ... USING GIN") manually |
asyncio.run() in Celery tasks |
asyncio.run(_async_func()) is the standard bridge |
Established in Phase 3 document_tasks.py | Celery sync task calls asyncio.run() to enter async context |
Deprecated/outdated:
to_tsvectorwith stored Computed column: adds schema complexity; expression index is simpler and equally performant for this scale- Separate audit log middleware: D-14 decision is inline writes; middleware approach not used
Assumptions Log
| # | Claim | Section | Risk if Wrong |
|---|---|---|---|
| A1 | The streaming proxy's shared-document access check (Pattern 4) is implied by D-08 and SHARE-02 | Pattern 4, Security Domain | If recipients cannot access the proxy, "Shared with me" is useless; planner must confirm access rule |
| A2 | session.flush() (not commit()) inside write_audit_log() is the correct transactional pattern |
Pattern 6 | Using commit() would create two separate transactions; if primary operation fails after audit commit, audit entry is orphaned with no corresponding operation |
| A3 | 300ms debounce interval for search | Pattern 10 | If too short, excessive API calls; if too long, UX feels sluggish. Standard convention, low risk |
| A4 | WITH RECURSIVE CTE for subtree cascade-delete is the correct PostgreSQL approach; SQLite tests must use INTEGRATION=1 |
Pitfall 1, Common Pitfalls | SQLite does not support WITH RECURSIVE — all subtree cascade tests require PostgreSQL |
| A5 | MinIOBackend.put_object_raw() method needs to be added for audit-logs bucket (different key scheme from documents) |
Pattern 8 | If not added, Celery export task cannot upload to audit-logs bucket using the correct key |
| A6 | The audit-logs MinIO bucket creation is safe to gate on MINIO_ENDPOINT env var (same as migration 0003 pattern) |
Pitfall 8 | If MinIO is present but bucket creation is skipped, first export task fails |
If this table is empty: Not applicable — some assumptions remain in this research.
Open Questions (RESOLVED)
-
Subtree folder delete — CTE vs. iterative in Python
- What we know:
WITH RECURSIVECTE works in PostgreSQL; SQLite does not support it - What's unclear: Should the planner use the CTE (PostgreSQL-only, integration test) or iterative Python (works in SQLite, slower for deep trees)?
- Recommendation: Use CTE for the implementation (real database is PostgreSQL); mark the cascade-delete tests as
pytest.mark.skipif(not live_services_available, ...)or useINTEGRATION=1
- What we know:
-
Shared document access in proxy —
_can_access_document()adds a DB query per request- What we know: Every
GET /api/documents/{id}/contentcall currently does one DB lookup; adding share check adds a second - What's unclear: Is the extra query acceptable given Phase 4 scale (single-user → small multi-user)?
- Recommendation: Add the share check unconditionally; at Phase 4 scale a second indexed query is negligible
- What we know: Every
-
PATCH /api/me/preferencesendpoint path- What we know: D-10 specifies this endpoint for pdf_open_mode
- What's unclear: Should it be on
/api/auth/me/preferences(alongside/api/auth/me) or/api/me/preferences(new router)? - Recommendation: Place on
/api/auth/me/preferences(same router prefix as/api/auth/me) to keep auth-related user settings in the same module
Sources
Primary (HIGH confidence)
backend/db/models.py— all ORM models verified:Folder,Share,AuditLog,Document.folder_id,User(missingpdf_open_mode),CloudConnection(credentials_enccolumn confirmed)backend/api/documents.py— ownership assertion pattern,get_regular_userdep, quota UPDATE patternbackend/deps/auth.py—get_regular_userraises 403 for admin;get_current_adminpatternbackend/celery_app.py— existingbeat_schedulestructure;cleanup_abandoned_uploadsas templatebackend/storage/minio_backend.py—get_object()confirmed;asyncio.to_thread()pattern confirmedbackend/services/storage.py—list_metadatascoping byuser_idconfirmed
Secondary (MEDIUM confidence)
- FastAPI discussions #7718 — Range header StreamingResponse pattern verified by multiple community confirmations
- PostgreSQL docs — Full-text search tables — GIN expression index SQL verified from official docs
- SQLAlchemy PostgreSQL dialect docs —
func.to_tsvector,plainto_tsquerysupport confirmed - Celery periodic tasks docs —
crontab(hour=0, minute=0)pattern - FastAPI response model docs —
response_model=CloudConnectionOutexclusion pattern
Tertiary (LOW confidence)
- Alembic issue #1390 — tsvector autogenerate bug confirmed by issue thread; marked for validation
Metadata
Confidence breakdown:
- Standard stack: HIGH — all packages already in requirements.txt; codebase verified
- Architecture: HIGH — all ORM models exist and were verified in codebase
- Pitfalls: MEDIUM — Pitfalls 1, 4 are confirmed from codebase + security requirements; Pitfall 2 confirmed from Alembic issue tracker
- Code patterns: MEDIUM — Range header pattern from FastAPI community discussion; PostgreSQL FTS from official docs
Research date: 2026-05-25 Valid until: 2026-06-25 (stable FastAPI/SQLAlchemy stack; no fast-moving dependencies)