Includes planning artifacts (03-CONTEXT, 03-DISCUSSION-LOG, 03-02-SUMMARY), integration test script, MinIO/auth/docker fixes, and local dev account reference. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
12 KiB
Phase 3: Document Migration & Multi-User Isolation - Context
Gathered: 2026-05-23 Status: Ready for planning
## Phase BoundaryEnforce per-user ownership on all documents: make documents.user_id NOT NULL (Phase 1 D-03 deferred to here), add get_current_user guards to all /api/documents/* endpoints (Phase 2 D-07 deferred to here), implement presigned PUT URL upload flow, enforce atomic quota on upload and delete, wire per-user AI classification config from DB, and retire the flat-file settings system. Existing document UI continues to work — updated to use the new two-step upload flow.
This phase does NOT include folder navigation, sharing, or PDF preview (Phase 4). It does NOT include cloud storage backends (Phase 5). The quota bar frontend component is included (STORE-04 is scoped here per REQUIREMENTS.md traceability).
STORE-08 (Celery+Redis) was completed in Phase 1 — no work needed.
## Implementation DecisionsNull-User Record Cleanup
- D-01: All documents with
user_id=NULLare deleted (both DB rows and their MinIO objects) before the NOT NULL constraint is added. These are dev/test data only — consistent with Phase 1 D-04 which deleted flat-file test data with the same reasoning. Zero production data loss. - D-02: Cleanup is baked into the Alembic migration's
upgrade()function — the migration first deletes all null-user Document rows (and calls the storage backend to delete corresponding MinIO objects), then adds theNOT NULLconstraint todocuments.user_id. One command, atomic flow. - D-03: After null-user cleanup, reconcile quota
used_bytesfrom actual document data:UPDATE quotas SET used_bytes = (SELECT COALESCE(SUM(size_bytes), 0) FROM documents WHERE documents.user_id = quotas.user_id). Phase 3 starts with accurate quota state for all users.
Presigned Upload Flow
- D-04: Phase 3 implements direct-to-MinIO presigned PUT uploads per CLAUDE.md architectural rule ("bytes never pass through the API layer"). The existing multipart POST-to-FastAPI upload endpoint is replaced.
- D-05: Two-step upload flow:
- Step 1 —
POST /api/documents/upload-url: FastAPI creates aDocumentrow (status='pending'), generates a presigned PUT URL (15-min TTL), returns{upload_url, document_id}. Quota is NOT reserved at this step. - Step 2 — Frontend PUTs bytes directly to MinIO using the presigned URL.
- Step 3 —
POST /api/documents/{id}/confirm: FastAPI retrieves file size from MinIO stat (authoritative), runs atomic quota UPDATE, updates Document row (status='uploaded'), and enqueuesextract_and_classify.delay(document_id).
- Step 1 —
- D-06: Abandoned uploads (presigned URL fetched but
/confirmnever called): Celery beat periodic task deletesDocumentrows older than 1 hour withstatus='pending'and their MinIO objects. Quota is never reserved for pending rows — no cleanup of quota needed. - D-07: Quota is enforced atomically at the
/confirmstep using the file size retrieved from MinIO stat (not client-supplied). The atomic SQL pattern (from CLAUDE.md) applies:UPDATE quotas SET used_bytes = used_bytes + $delta WHERE (used_bytes + $delta) <= limit_bytes RETURNING used_bytes. A 413 response is returned if the UPDATE returns no rows (quota exceeded). Document delete atomically decrements:UPDATE quotas SET used_bytes = GREATEST(0, used_bytes - $delta).
Topics Isolation Model
- D-08: Layered topic namespace: system topics (
user_id=NULL) are visible to all users as defaults; per-user topics (user_id=current_user.id) are visible only to that user. A user's topic list is the union of system topics + their own topics. - D-09: Only admin can create, edit, and delete system topics via a new
POST /api/admin/topicsendpoint. Regular users can only CRUD their own per-user topics via/api/topics/*(now auth-gated withget_current_user). - D-10: All existing topics in the DB (currently
user_id=NULLfrom Phase 1/2 test sessions) are deleted in Phase 3 migration — consistent with null-user document cleanup. Admin seeds system topics fresh post-Phase 3. - D-11: AI classification receives system topics + user's own topics as the existing-topics input. New AI-suggested topics are created in the user's namespace (
user_id=current_user.id), not as system topics.
Settings Flat-File Retirement
- D-12:
/api/settingsendpoint is removed entirely in Phase 3.services/storage.pyload_settings()/save_settings()flat-file functions are deleted.settings.jsonis deleted. All AI config comes from DB (users.ai_provider/users.ai_modelset by admin). - D-13: System prompt moves to a
SYSTEM_PROMPTenv var inconfig.py(optional). If not set,services/classifier.pyuses a hardcoded default prompt string. No DB table needed. - D-14: Celery
extract_and_classifytask resolves AI config viadoc.user_id → users.ai_provider + users.ai_model(a second DB lookup within the same task session). Nouser_idparameter added to the task signature. - D-15: If
user.ai_providerisNone(user has no admin-assigned AI config), classifier falls back toDEFAULT_AI_PROVIDER+DEFAULT_AI_MODELenv vars (both optional inconfig.py; code default:"ollama"/"llama3.2").
Auth Guards
- D-16: All
/api/documents/*endpoints gainget_current_userdependency (Phase 2 D-07 fulfilled). Every handler assertsdocument.user_id == current_user.idbefore returning — 404 (not 403) for cross-user access to avoid information leakage. Admin role returns 403 on all document endpoints per Phase 3 SC4 (completing Phase 2 SC5 via D-07). - D-17:
/api/topics/*gainsget_current_user. Topic queries filter byuser_id IN (current_user.id, NULL)— user sees their own topics + system topics.
<canonical_refs>
Canonical References
Downstream agents MUST read these before planning or implementing.
Requirements
.planning/REQUIREMENTS.md— STORE-03 (atomic quota enforce), STORE-04 (quota bar UI), STORE-05 (upload rejection error), STORE-06 (atomic quota decrement on delete), STORE-08 (Celery+Redis — done in Phase 1), SEC-04 (DB-lookup file access), DOC-03 (per-user AI provider), DOC-04 (system topics + per-user overrides), DOC-05 (classification uses user's assigned provider)
Roadmap & Success Criteria
.planning/ROADMAP.md— Phase 3 goal and all 5 success criteria (especially SC2: concurrent quota race, SC4: 403 on cross-user access + admin 403, SC5: per-user AI classification)
Architecture Constraints
CLAUDE.md— Key Architectural Rules: presigned MinIO URL flow (bytes never through API), MinIO key schema, atomic quota UPDATE pattern, SEC-04 enforcement, admin endpoints never return document content
Prior Phase Decisions
.planning/phases/01-infrastructure-foundation/01-CONTEXT.md— D-03 (documents.user_id nullable in Phase 1), D-05 (storage service replaced), D-06 (MinIO key schema), D-08/D-09 (Celery+Redis wired).planning/phases/02-users-authentication/02-CONTEXT.md— D-07 (documents endpoints stay public in Phase 2, gain guards in Phase 3), D-08/D-09 (admin endpoints, CORS)
Project Decisions
.planning/PROJECT.md— Core Value: per-user isolation; Key Decisions: PostgreSQL+MinIO rationale, atomic quota UPDATE, privacy-first admin model
</canonical_refs>
<code_context>
Existing Code Insights
Reusable Assets
backend/deps/auth.py—get_current_userandget_current_adminFastAPI dependencies ready to inject into document/topic endpointsbackend/db/models.py—Document,Quota,Topic,DocumentTopicORM models complete;documents.user_idis nullable (change to NOT NULL in Phase 3 migration);quotas.used_bytesandlimit_bytesare in placebackend/storage/minio_backend.py—MinIOBackend.put_object()anddelete_object()— extend withgenerate_presigned_put_url()for Phase 3 upload flow; addstat_object()to retrieve file size after uploadbackend/storage/base.py—StorageBackendABC — addgenerate_presigned_put_url(...)abstract methodbackend/tasks/document_tasks.py—extract_and_classifytask; update_run()to look updoc.user_id → user.ai_provider/ai_modeland pass user config to classifierbackend/services/classifier.py— update to acceptai_providerandai_modelparameters instead of reading fromload_settings()backend/celery_app.py— Celery beat schedule: add periodic task for abandoned upload cleanup
Established Patterns
- Atomic quota UPDATE —
UPDATE quotas SET used_bytes = used_bytes + $delta WHERE (used_bytes + $delta) <= limit_bytes RETURNING used_bytes— usesession.execute(text(...))with bound params; checkresult.rowcountto detect quota exceeded - Service layer boundary —
services/classifier.pyis pure Python, no FastAPI coupling; call with explicit parameters rather than reading global config get_current_userinjection — Phase 2 pattern:current_user: User = Depends(get_current_user)in each handler;current_user: User = Depends(get_current_admin)for admin-only routesasyncio.to_thread()for MinIO sync SDK calls (established in Phase 1storage/minio_backend.py)
Integration Points
backend/api/documents.py— replace existing upload handler with upload-url + confirm endpoints; addget_current_userto all handlers; adddocument.user_id == current_user.idownership assertionbackend/api/topics.py— addget_current_user; filter all topic queries byuser_id IN (current_user.id, NULL)backend/services/storage.py— removeload_settings()/save_settings(); updatesave_upload()to acceptuser_idparameter; updatedelete_document()to decrement quotabackend/config.py— addSYSTEM_PROMPT,DEFAULT_AI_PROVIDER,DEFAULT_AI_MODELoptional env varsfrontend/src/stores/documents.js(or equivalent) — update upload flow from single multipart POST to two-step: get upload URL, PUT to MinIO, call confirmfrontend/src/components/layout/AppSidebar.vue— add quota bar (current/limit in MB, amber at 80%, red at 95%) — STORE-04
Constraints from Prior Phases
- MinIO key schema
{user_id}/{document_id}/{uuid4()}{ext}is locked (Phase 1 D-06) — enforced inMinIOBackend.put_object() documents.user_idis currently nullable — Phase 3 Alembic migration makes it NOT NULL after cleanup- Celery+Redis already wired and operational — no infrastructure changes needed
BackupCodemodel andbackup_codestable exist from Phase 2 — no changes needed
</code_context>
## Specific Ideas- Phase 3 Alembic migration is
0003_multi_user_isolation.py— cleanup + NOT NULL + topic cleanup + quota reconciliation in one migration - Presigned PUT URL TTL: 15 minutes (matches typical upload timeout for large documents)
- Abandoned upload cleanup: Celery beat task running every 30 minutes, deletes
pendingDocument rows older than 1 hour stat_object()for MinIO: use MinIO SDKstat_object(bucket, key)→.sizeattribute to get authoritative file size at confirm time- Quota exceeded response: HTTP 413 with body
{"detail": {"used_bytes": N, "limit_bytes": M, "rejected_bytes": K}} - Per-user topic query:
WHERE (topics.user_id = :uid OR topics.user_id IS NULL)with an index ontopics.user_id - Frontend quota bar: fetch from new
GET /api/me/quotaendpoint returning{used_bytes, limit_bytes}— add this endpoint to the auth API
- Presigned GET URLs for document downloads — Phase 4 (DOC-02: PDF preview proxied through app). Phase 3 does not expose presigned GET URLs to the browser.
- Per-user system prompt overrides — out of scope for v1; system prompt is global via env var
- Quota reservation at upload-url initiation with client-supplied size — decided against in favor of confirm-time enforcement
- MinIO event notification webhook approach — deferred; two-step confirm is sufficient for Phase 3
Phase: 3-Document Migration & Multi-User Isolation Context gathered: 2026-05-23