# Phase 3: Document Migration & Multi-User Isolation - Discussion Log > **Audit trail only.** Do not use as input to planning, research, or execution agents. > Decisions are captured in CONTEXT.md — this log preserves the alternatives considered. **Date:** 2026-05-23 **Phase:** 3-Document Migration & Multi-User Isolation **Areas discussed:** Null-user record cleanup, Upload bytes flow, Topics isolation model, Settings flat-file retirement --- ## Null-user record cleanup ### Q1: How to handle existing documents with user_id=NULL before NOT NULL migration? | Option | Description | Selected | |--------|-------------|----------| | Delete them | Dev/test data — Phase 1 D-04 deleted flat-file data with same reasoning. Pre-migration script deletes all NULL-user documents + MinIO objects. | ✓ | | Assign to first admin | Script sets user_id = first admin's id for all NULL rows. Data preserved but attributed to admin. | | | Fail migration if any exist | Guard in Alembic migration; manual cleanup required before re-run. | | **User's choice:** Delete them --- ### Q2: Delete MinIO objects along with DB rows? | Option | Description | Selected | |--------|-------------|----------| | Delete both DB rows and MinIO objects | Script calls storage_backend.delete_object() before deleting row. No orphaned objects. | ✓ | | DB rows only | Simpler, but leaves orphaned null-user/... objects in MinIO. | | **User's choice:** Delete both DB rows and MinIO objects --- ### Q3: Cleanup as standalone script or baked into Alembic migration? | Option | Description | Selected | |--------|-------------|----------| | Baked into Alembic migration | upgrade() runs cleanup first, then adds NOT NULL. One command, one atomic flow. | ✓ | | Standalone script | Separate scripts/cleanup_null_user_docs.py, must be run manually before migration. | | **User's choice:** Baked into Alembic migration --- ### Q4: Reconcile quota used_bytes during migration? | Option | Description | Selected | |--------|-------------|----------| | Yes, reconcile quotas | UPDATE quotas SET used_bytes = actual sum of document sizes per user. Accurate starting state. | ✓ | | No, quotas start at zero | All quotas reset to 0; accurate from first upload via new enforcement. | | **User's choice:** Yes, reconcile quotas --- ## Upload bytes flow ### Q1: Presigned PUT URLs vs bytes-through-FastAPI? | Option | Description | Selected | |--------|-------------|----------| | Keep bytes through FastAPI | Current flow preserved. 'Presigned URL flow' refers to download presigning only. | | | Implement presigned PUT URLs in Phase 3 | Direct-to-MinIO uploads per CLAUDE.md architectural rule. Requires frontend changes. | ✓ | **User's choice:** Implement presigned PUT URLs in Phase 3 --- ### Q2: How should the presigned upload flow work end-to-end? | Option | Description | Selected | |--------|-------------|----------| | Two-step: initiate + confirm | POST upload-url → PUT to MinIO → POST confirm. Clean separation. | ✓ | | One-step with webhook/polling | MinIO event notification webhook. More complex. | | **User's choice:** Two-step: initiate + confirm --- ### Q3: Handling abandoned uploads? | Option | Description | Selected | |--------|-------------|----------| | Let them expire naturally (Celery cleanup) | Celery beat deletes pending rows older than 1 hour + MinIO objects. No quota reserved for pending. | ✓ | | Quota reserved on initiate, released on timeout | Reserve at step 1, refund on timeout. More complex. | | | No cleanup — pending rows stay | Orphaned but harmless. Not recommended for production. | | **User's choice:** Let them expire naturally (Celery beat cleanup) --- ### Q4: When to enforce quota? | Option | Description | Selected | |--------|-------------|----------| | At confirm (Recommended) | Atomic quota UPDATE runs at /confirm. File size from MinIO stat (authoritative). | ✓ | | At initiation with client-supplied size | Reserve quota at step 1. Requires trusting client-supplied size or verifying at confirm. | | **User's choice:** At confirm --- ## Topics isolation model ### Q1: How should the topics namespace work? | Option | Description | Selected | |--------|-------------|----------| | System defaults (user_id=NULL) + per-user overrides | Union of system topics + user's own topics. Admin manages system topics. DOC-04 compliant. | ✓ | | Fully isolated per user | Own complete topic namespace per user. No shared topics. | | | Fully global (shared) | All topics shared. Simplest but violates topic privacy. | | **User's choice:** System defaults + per-user overrides --- ### Q2: Who can create system-level topics? | Option | Description | Selected | |--------|-------------|----------| | Admin only via /api/admin/topics | Only admin creates/edits/deletes system topics. Regular users CRUD own topics. | ✓ | | Any user can promote a topic to system | Requires admin approval flow — defer to v2. | | | Admin and users both create to own namespace | Same result as option 1. | | **User's choice:** Admin only via /api/admin/topics --- ### Q3: What happens to existing topics (currently all user_id=NULL)? | Option | Description | Selected | |--------|-------------|----------| | Keep as system topics | They become system defaults automatically. Admin can delete unwanted ones. | | | Delete all existing topics | Fresh start — dev/test data. Admin seeds system topics after Phase 3. | ✓ | **User's choice:** Delete all existing topics --- ### Q4: What topics does AI classification see? | Option | Description | Selected | |--------|-------------|----------| | System topics + user's own topics | Union of user_id=NULL and user_id=current_user.id. New suggestions go into user namespace. | ✓ | | User's topics only | Classifier only sees personal topics. New user starts with empty list. | | **User's choice:** System topics + user's own topics; new suggestions created in user's namespace --- ## Settings flat-file retirement ### Q1: What happens to /api/settings and settings.json? | Option | Description | Selected | |--------|-------------|----------| | Keep settings.json for system prompt + provider defaults only | Remove AI API key config. Keep system_prompt and defaults. /api/settings becomes read-only. | | | Remove /api/settings entirely in Phase 3 | Clean break. All AI config from DB. System prompt → env var. | ✓ | | Keep /api/settings as-is but auth-gate it | Backward compat with flat-file. Technical debt preserved. | | **User's choice:** Remove /api/settings entirely --- ### Q2: Where does the system prompt live after /api/settings is removed? | Option | Description | Selected | |--------|-------------|----------| | Env var SYSTEM_PROMPT with code default fallback | SYSTEM_PROMPT env var, optional. Code default in classifier.py if not set. | ✓ | | New system_config DB table | Flexible but adds schema + endpoint complexity. | | | Per-user system prompts from users table | Highly flexible but out of Phase 3 scope. | | **User's choice:** SYSTEM_PROMPT env var with hardcoded code default --- ### Q3: How does Celery task resolve AI config? | Option | Description | Selected | |--------|-------------|----------| | Look up from doc.user_id → users.ai_provider/ai_model | Task has document_id → fetch doc → fetch user → AI config. Correct per-user isolation. | ✓ | | Pass user_id as a task argument | Upload confirm endpoint passes user_id to task. Avoids extra DB query for doc row. | | **User's choice:** Look up from doc.user_id → users.ai_provider/ai_model --- ### Q4: Fallback when user has no ai_provider assigned? | Option | Description | Selected | |--------|-------------|----------| | DEFAULT_AI_PROVIDER + DEFAULT_AI_MODEL env vars | Optional env vars in config.py with safe code default (ollama/llama3.2). | ✓ | | Hardcoded fallback to ollama | Simpler, requires code change to switch later. | | | Raise an error | Classification fails if no provider assigned. Admin must assign before uploads work. | | **User's choice:** DEFAULT_AI_PROVIDER + DEFAULT_AI_MODEL env vars --- ## Claude's Discretion None — all major decisions were made by user. ## Deferred Ideas - Presigned GET URLs for document downloads — Phase 4 (DOC-02: PDF preview proxied through app) - Per-user system prompt overrides — out of scope for v1 - Quota reservation at upload-url initiation — decided against; confirm-time enforcement preferred - MinIO event notification webhook approach — deferred in favor of two-step confirm