Includes planning artifacts (03-CONTEXT, 03-DISCUSSION-LOG, 03-02-SUMMARY), integration test script, MinIO/auth/docker fixes, and local dev account reference. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
8.4 KiB
Phase 3: Document Migration & Multi-User Isolation - Discussion Log
Audit trail only. Do not use as input to planning, research, or execution agents. Decisions are captured in CONTEXT.md — this log preserves the alternatives considered.
Date: 2026-05-23 Phase: 3-Document Migration & Multi-User Isolation Areas discussed: Null-user record cleanup, Upload bytes flow, Topics isolation model, Settings flat-file retirement
Null-user record cleanup
Q1: How to handle existing documents with user_id=NULL before NOT NULL migration?
| Option | Description | Selected |
|---|---|---|
| Delete them | Dev/test data — Phase 1 D-04 deleted flat-file data with same reasoning. Pre-migration script deletes all NULL-user documents + MinIO objects. | ✓ |
| Assign to first admin | Script sets user_id = first admin's id for all NULL rows. Data preserved but attributed to admin. | |
| Fail migration if any exist | Guard in Alembic migration; manual cleanup required before re-run. |
User's choice: Delete them
Q2: Delete MinIO objects along with DB rows?
| Option | Description | Selected |
|---|---|---|
| Delete both DB rows and MinIO objects | Script calls storage_backend.delete_object() before deleting row. No orphaned objects. | ✓ |
| DB rows only | Simpler, but leaves orphaned null-user/... objects in MinIO. |
User's choice: Delete both DB rows and MinIO objects
Q3: Cleanup as standalone script or baked into Alembic migration?
| Option | Description | Selected |
|---|---|---|
| Baked into Alembic migration | upgrade() runs cleanup first, then adds NOT NULL. One command, one atomic flow. | ✓ |
| Standalone script | Separate scripts/cleanup_null_user_docs.py, must be run manually before migration. |
User's choice: Baked into Alembic migration
Q4: Reconcile quota used_bytes during migration?
| Option | Description | Selected |
|---|---|---|
| Yes, reconcile quotas | UPDATE quotas SET used_bytes = actual sum of document sizes per user. Accurate starting state. | ✓ |
| No, quotas start at zero | All quotas reset to 0; accurate from first upload via new enforcement. |
User's choice: Yes, reconcile quotas
Upload bytes flow
Q1: Presigned PUT URLs vs bytes-through-FastAPI?
| Option | Description | Selected |
|---|---|---|
| Keep bytes through FastAPI | Current flow preserved. 'Presigned URL flow' refers to download presigning only. | |
| Implement presigned PUT URLs in Phase 3 | Direct-to-MinIO uploads per CLAUDE.md architectural rule. Requires frontend changes. | ✓ |
User's choice: Implement presigned PUT URLs in Phase 3
Q2: How should the presigned upload flow work end-to-end?
| Option | Description | Selected |
|---|---|---|
| Two-step: initiate + confirm | POST upload-url → PUT to MinIO → POST confirm. Clean separation. | ✓ |
| One-step with webhook/polling | MinIO event notification webhook. More complex. |
User's choice: Two-step: initiate + confirm
Q3: Handling abandoned uploads?
| Option | Description | Selected |
|---|---|---|
| Let them expire naturally (Celery cleanup) | Celery beat deletes pending rows older than 1 hour + MinIO objects. No quota reserved for pending. | ✓ |
| Quota reserved on initiate, released on timeout | Reserve at step 1, refund on timeout. More complex. | |
| No cleanup — pending rows stay | Orphaned but harmless. Not recommended for production. |
User's choice: Let them expire naturally (Celery beat cleanup)
Q4: When to enforce quota?
| Option | Description | Selected |
|---|---|---|
| At confirm (Recommended) | Atomic quota UPDATE runs at /confirm. File size from MinIO stat (authoritative). | ✓ |
| At initiation with client-supplied size | Reserve quota at step 1. Requires trusting client-supplied size or verifying at confirm. |
User's choice: At confirm
Topics isolation model
Q1: How should the topics namespace work?
| Option | Description | Selected |
|---|---|---|
| System defaults (user_id=NULL) + per-user overrides | Union of system topics + user's own topics. Admin manages system topics. DOC-04 compliant. | ✓ |
| Fully isolated per user | Own complete topic namespace per user. No shared topics. | |
| Fully global (shared) | All topics shared. Simplest but violates topic privacy. |
User's choice: System defaults + per-user overrides
Q2: Who can create system-level topics?
| Option | Description | Selected |
|---|---|---|
| Admin only via /api/admin/topics | Only admin creates/edits/deletes system topics. Regular users CRUD own topics. | ✓ |
| Any user can promote a topic to system | Requires admin approval flow — defer to v2. | |
| Admin and users both create to own namespace | Same result as option 1. |
User's choice: Admin only via /api/admin/topics
Q3: What happens to existing topics (currently all user_id=NULL)?
| Option | Description | Selected |
|---|---|---|
| Keep as system topics | They become system defaults automatically. Admin can delete unwanted ones. | |
| Delete all existing topics | Fresh start — dev/test data. Admin seeds system topics after Phase 3. | ✓ |
User's choice: Delete all existing topics
Q4: What topics does AI classification see?
| Option | Description | Selected |
|---|---|---|
| System topics + user's own topics | Union of user_id=NULL and user_id=current_user.id. New suggestions go into user namespace. | ✓ |
| User's topics only | Classifier only sees personal topics. New user starts with empty list. |
User's choice: System topics + user's own topics; new suggestions created in user's namespace
Settings flat-file retirement
Q1: What happens to /api/settings and settings.json?
| Option | Description | Selected |
|---|---|---|
| Keep settings.json for system prompt + provider defaults only | Remove AI API key config. Keep system_prompt and defaults. /api/settings becomes read-only. | |
| Remove /api/settings entirely in Phase 3 | Clean break. All AI config from DB. System prompt → env var. | ✓ |
| Keep /api/settings as-is but auth-gate it | Backward compat with flat-file. Technical debt preserved. |
User's choice: Remove /api/settings entirely
Q2: Where does the system prompt live after /api/settings is removed?
| Option | Description | Selected |
|---|---|---|
| Env var SYSTEM_PROMPT with code default fallback | SYSTEM_PROMPT env var, optional. Code default in classifier.py if not set. | ✓ |
| New system_config DB table | Flexible but adds schema + endpoint complexity. | |
| Per-user system prompts from users table | Highly flexible but out of Phase 3 scope. |
User's choice: SYSTEM_PROMPT env var with hardcoded code default
Q3: How does Celery task resolve AI config?
| Option | Description | Selected |
|---|---|---|
| Look up from doc.user_id → users.ai_provider/ai_model | Task has document_id → fetch doc → fetch user → AI config. Correct per-user isolation. | ✓ |
| Pass user_id as a task argument | Upload confirm endpoint passes user_id to task. Avoids extra DB query for doc row. |
User's choice: Look up from doc.user_id → users.ai_provider/ai_model
Q4: Fallback when user has no ai_provider assigned?
| Option | Description | Selected |
|---|---|---|
| DEFAULT_AI_PROVIDER + DEFAULT_AI_MODEL env vars | Optional env vars in config.py with safe code default (ollama/llama3.2). | ✓ |
| Hardcoded fallback to ollama | Simpler, requires code change to switch later. | |
| Raise an error | Classification fails if no provider assigned. Admin must assign before uploads work. |
User's choice: DEFAULT_AI_PROVIDER + DEFAULT_AI_MODEL env vars
Claude's Discretion
None — all major decisions were made by user.
Deferred Ideas
- Presigned GET URLs for document downloads — Phase 4 (DOC-02: PDF preview proxied through app)
- Per-user system prompt overrides — out of scope for v1
- Quota reservation at upload-url initiation — decided against; confirm-time enforcement preferred
- MinIO event notification webhook approach — deferred in favor of two-step confirm