# Phase 3: Document Migration & Multi-User Isolation — Research
**Researched:** 2026-05-23
**Domain:** FastAPI + SQLAlchemy async + MinIO presigned PUT + Celery beat + Vue 3 XHR upload
**Confidence:** HIGH
---
## User Constraints (from CONTEXT.md)
### Locked Decisions
**Null-User Record Cleanup**
- D-01: All documents with `user_id=NULL` are deleted (DB rows + MinIO objects) before NOT NULL is added. Zero production data loss (dev/test data only).
- D-02: Cleanup baked into Alembic `upgrade()` in `0003_multi_user_isolation.py`.
- D-03: After null-user cleanup, reconcile quota: `UPDATE quotas SET used_bytes = (SELECT COALESCE(SUM(size_bytes), 0) FROM documents WHERE documents.user_id = quotas.user_id)`.
**Presigned Upload Flow**
- D-04: Direct-to-MinIO presigned PUT uploads. Existing multipart POST-to-FastAPI endpoint is replaced.
- D-05: Three-step flow: (1) `POST /api/documents/upload-url` → creates `Document` row (`status='pending'`), generates presigned URL, returns `{upload_url, document_id}`. Quota NOT reserved here. (2) Browser PUTs bytes to MinIO. (3) `POST /api/documents/{id}/confirm` → `stat_object()` for authoritative size, atomic quota UPDATE, `status='uploaded'`, enqueue Celery task.
- D-06: Abandoned upload cleanup: Celery beat task every 30 minutes deletes `Document` rows older than 1 hour with `status='pending'` and their MinIO objects.
- D-07: Atomic quota at confirm step. `UPDATE quotas SET used_bytes = used_bytes + $delta WHERE (used_bytes + $delta) <= limit_bytes RETURNING used_bytes`. 413 response if no rows returned. Delete decrements atomically with `GREATEST(0, used_bytes - $delta)`.
**Topics Isolation Model**
- D-08: System topics (`user_id=NULL`) visible to all; per-user topics (`user_id=current_user.id`) visible only to that user.
- D-09: Only admin can CRUD system topics via `POST /api/admin/topics`. Regular users CRUD own topics via `/api/topics/*`.
- D-10: All existing topics deleted in Phase 3 migration. Admin seeds system topics fresh post-Phase 3.
- D-11: AI classification receives system + user topics; new AI-suggested topics go into user namespace.
**Settings Flat-File Retirement**
- D-12: `/api/settings` endpoint removed. `load_settings()`, `save_settings()`, `settings.json` deleted.
- D-13: System prompt moves to `SYSTEM_PROMPT` env var in `config.py`. Hardcoded fallback in `classifier.py`.
- D-14: Celery task resolves AI config via `doc.user_id → users.ai_provider + users.ai_model` (second DB lookup in same session).
- D-15: Fallback: `DEFAULT_AI_PROVIDER` + `DEFAULT_AI_MODEL` env vars (code defaults: `"ollama"` / `"llama3.2"`).
**Auth Guards**
- D-16: All `/api/documents/*` get `get_current_user`. Ownership assertion returns 404 (not 403) for cross-user access. Admin role returns 403 on all document endpoints.
- D-17: `/api/topics/*` gets `get_current_user`. Topic queries filter by `user_id IN (current_user.id, NULL)`.
### Claude's Discretion
- None specified beyond decisions above.
### Deferred Ideas (OUT OF SCOPE)
- Presigned GET URLs for document downloads (Phase 4 — DOC-02 PDF preview proxied through app)
- Per-user system prompt overrides
- Quota reservation at upload-url initiation with client-supplied size
- MinIO event notification webhook approach
---
## Phase Requirements
| ID | Description | Research Support |
|----|-------------|------------------|
| STORE-03 | Atomic quota enforcement at upload using UPDATE...WHERE...RETURNING pattern | Finding 4: verified pattern with psycopg v3 / SQLAlchemy `text()` + `fetchone()` null check |
| STORE-04 | Quota usage bar in sidebar (X MB of Y MB) amber at 80%, red at 95% | Finding 9: `GET /api/me/quota` endpoint; Vue reactivity via `useAuthStore` |
| STORE-05 | Upload rejection at quota limit with usage/size/link error | Finding 4: 413 response shape `{used_bytes, limit_bytes, rejected_bytes}` |
| STORE-06 | Atomic quota decrement on document delete | Finding 4: `GREATEST(0, used_bytes - $delta)` pattern verified |
| SEC-04 | All file access resolved through DB lookup — no object key reconstruction from params | Finding 7: ownership assertion on every document handler using `doc.user_id == current_user.id` |
| DOC-03 | AI provider/model from DB per user; user cannot change AI config | Finding 8: Celery task second DB lookup `doc.user_id → user.ai_provider` |
| DOC-04 | System default topics + per-user topic overrides preserved | Finding 6: `user_id IS NULL OR user_id = :uid` filter; index on `topics.user_id` |
| DOC-05 | AI classification uses user's assigned provider and model | Finding 8: classifier refactor to accept `ai_provider` + `ai_model` parameters |
---
## Research Summary
Phase 3 has three technical challenge clusters: (1) the Alembic migration with side effects (MinIO object deletion, quota reconciliation), (2) the two-step presigned PUT upload flow with its Docker hostname pitfall, and (3) multi-user data isolation across documents, topics, auth guards, and AI config.
The codebase is in excellent shape for this phase. The `get_current_user` / `get_current_admin` FastAPI dependencies are production-ready. The `Quota`, `Document`, and `Topic` ORM models already have all necessary columns. The atomic quota SQL pattern is straightforward to implement with SQLAlchemy `text()` + `session.execute()`. The MinIO SDK (`minio==7.2.20`) has `presigned_put_object()` and `stat_object()` with clean APIs. The biggest planning risk is the Docker hostname issue for browser-accessible presigned URLs — the generated URL contains the internal Docker hostname (`minio:9000`) which the browser cannot resolve; a second MinIO client instance configured with the public hostname is required.
**Primary recommendation:** Implement the phase in four waves: (1) Alembic migration + quota foundation, (2) presigned upload flow backend + auth guards, (3) Celery beat + classifier refactor, (4) frontend upload flow + quota bar.
---
## Architectural Responsibility Map
| Capability | Primary Tier | Secondary Tier | Rationale |
|------------|-------------|----------------|-----------|
| Quota enforcement | API / Backend | Database | Atomic UPDATE must be DB-side; API layer calls it and interprets result |
| Presigned URL generation | API / Backend | MinIO | FastAPI generates URL using SDK; browser uses URL to PUT directly to MinIO |
| File bytes transfer | MinIO (direct) | — | Bytes never pass through API layer (CLAUDE.md architectural rule) |
| Ownership assertion | API / Backend | Database | Every handler verifies `doc.user_id == current_user.id` before responding |
| Topic namespace | Database | API / Backend | DB filter `user_id IS NULL OR user_id = :uid`; API enforces writes into correct namespace |
| AI config resolution | Celery task | Database | Task does second DB lookup; no task signature change |
| Abandoned upload cleanup | Celery beat | MinIO + Database | Periodic task; both DB row and MinIO object must be cleaned |
| Quota bar display | Frontend (client) | API / Backend | `GET /api/me/quota` endpoint; Vue component polls/renders |
| Upload progress | Frontend (client) | — | XMLHttpRequest upload events; fetch API does not support upload progress |
---
## Key Findings
### Finding 1: Alembic Migration with Side Effects — Safe Pattern
[VERIFIED: codebase inspection + SQLAlchemy docs]
The Phase 3 migration (`0003_multi_user_isolation.py`) must perform: (1) delete null-user documents + their MinIO objects, (2) delete all topics, (3) add NOT NULL constraint to `documents.user_id`, (4) reconcile `quotas.used_bytes`.
**Critical constraint:** Alembic `upgrade()` runs synchronously. MinIO SDK calls are also synchronous (the async wrappers in `MinIOBackend` use `asyncio.to_thread()` which requires a running event loop). Inside Alembic, there is no event loop by default. The safe pattern is to call the synchronous MinIO SDK directly — not through `MinIOBackend`, which is the async wrapper. `asyncio.run()` inside `upgrade()` also works but adds unnecessary complexity.
**Established pattern in this codebase:** Migrations use `op.execute(text(...))` for data manipulation. For the MinIO side, inline synchronous SDK calls are appropriate:
```python
# In upgrade() — synchronous MinIO SDK, not the async wrapper
from minio import Minio
client = Minio(endpoint, access_key, secret_key, secure=False)
for obj_key in keys_to_delete:
client.remove_object(bucket, obj_key)
```
**Rollback safety:** The `downgrade()` function cannot restore deleted MinIO objects or recreate deleted rows — document this explicitly. The migration is intentionally one-way for the cleanup step (D-01). The NOT NULL constraint reversal is safe via `op.alter_column('documents', 'user_id', nullable=True)`.
**Transaction boundary:** The DB operations (DELETE rows, ALTER COLUMN, UPDATE quotas) run inside the Alembic transaction. The MinIO deletions happen outside any DB transaction — if MinIO deletion partially fails, the DB transaction can still commit. This is intentional: leftover MinIO objects are orphaned but harmless; they will not appear in the UI because the DB rows are gone.
**Privilege grants:** The existing pattern (from migration 0001/0002) requires `GRANT` statements for any new tables. Migration 0003 creates no new tables — no additional grants needed.
### Finding 2: MinIO `presigned_put_object()` API
[VERIFIED: `python3 -c "from minio import Minio; help(Minio.presigned_put_object)"`]
```python
client.presigned_put_object(
bucket_name: str,
object_name: str, # the pre-computed object key
expires: timedelta = timedelta(days=7), # use timedelta(minutes=15) per D-05
) -> str # returns the presigned URL string
```
The object key must be pre-computed BEFORE calling this method, since the presigned URL is tied to that exact key. The Phase 3 flow creates the key in FastAPI at upload-url time: `f"{user_id}/{document_id}/{uuid4()}{ext}"`. This key is stored in the `Document` row (`object_key` column) at `status='pending'` time, so `stat_object()` at confirm time uses the same key.
**Async wrapping:** `asyncio.to_thread(self._client.presigned_put_object, bucket, key, timedelta(minutes=15))` — same pattern as existing `put_object` calls in `MinIOBackend`.
### Finding 3: Docker Hostname Pitfall for Presigned URLs — CRITICAL
[VERIFIED: community documentation + MinIO GitHub discussions]
**The problem:** `MinIOBackend` is initialized with `endpoint="minio:9000"` (the Docker internal hostname). `presigned_put_object()` generates URLs using this hostname: `http://minio:9000/docuvault/...?X-Amz-Signature=...`. Browsers running on the host machine cannot resolve `minio` as a DNS name — they see a network error before the PUT even starts.
**Why you cannot just replace the hostname:** The hostname is part of the HMAC-SHA256 signature. Changing `minio:9000` to `localhost:9000` in the URL string invalidates the signature; MinIO returns `SignatureDoesNotMatch`.
**The solution: dual MinIO client.** `MinIOBackend` needs a second Minio client instance initialized with the public/browser-accessible endpoint:
```python
class MinIOBackend(StorageBackend):
def __init__(self, endpoint, access_key, secret_key, bucket, secure=False,
public_endpoint=None):
self._bucket = bucket
self._client = Minio(endpoint, access_key, secret_key, secure=secure)
# Second client for presigned URL generation — uses browser-accessible hostname.
# Falls back to internal client if not configured.
self._public_client = Minio(
public_endpoint or endpoint, access_key, secret_key, secure=secure
)
```
`presigned_put_object()` uses `self._public_client`; all other operations (`put_object`, `get_object`, `stat_object`, `delete_object`) use `self._client` (internal).
**Configuration:** Add `MINIO_PUBLIC_ENDPOINT` env var to `config.py` (optional, defaults to `MINIO_ENDPOINT`). In `.env.example`: `MINIO_PUBLIC_ENDPOINT=localhost:9000`. For production where MinIO is behind a load balancer or nginx, set this to the public URL.
**MinIO CORS for browser PUT:** MinIO open-source supports a global CORS setting via `MINIO_API_CORS_ALLOW_ORIGIN` environment variable (or `mc admin config set api cors_allow_origin=...`). For development: `MINIO_API_CORS_ALLOW_ORIGIN: "http://localhost:5173"`. Add to `docker-compose.yml` MinIO service environment block.
Per-bucket CORS configuration (S3 `PutBucketCors` API) is an enterprise-only feature in MinIO. The `minio` Python SDK 7.2.20 has no `set_bucket_cors()` method — confirmed by `dir(Minio)` inspection. The global env var approach is the correct solution for the open-source edition. [ASSUMED: the global `MINIO_API_CORS_ALLOW_ORIGIN` env var is reliable in current MinIO docker image — early reports (2021 issue #11258) showed it unreliable, but more recent reports (2024 issue #16479) indicate `mc admin config set` works; env var should be equivalent on fresh container start]
### Finding 4: Atomic Quota Enforcement Pattern
[VERIFIED: SQLAlchemy docs + psycopg v3 docs + codebase pattern inspection]
**Upload enforcement (STORE-03, STORE-05):**
```python
from sqlalchemy import text
async def enforce_quota(session: AsyncSession, user_id: uuid.UUID, delta: int) -> int:
"""Atomically increment quota. Returns new used_bytes, or raises HTTPException(413)."""
result = await session.execute(
text("""
UPDATE quotas
SET used_bytes = used_bytes + :delta
WHERE user_id = :uid
AND (used_bytes + :delta) <= limit_bytes
RETURNING used_bytes, limit_bytes
"""),
{"delta": delta, "uid": str(user_id)},
)
row = result.fetchone()
if row is None:
# Quota exceeded — fetch current state for the 413 body
quota_row = await session.execute(
text("SELECT used_bytes, limit_bytes FROM quotas WHERE user_id = :uid"),
{"uid": str(user_id)},
)
q = quota_row.fetchone()
raise HTTPException(
status_code=413,
detail={
"used_bytes": q.used_bytes if q else 0,
"limit_bytes": q.limit_bytes if q else 0,
"rejected_bytes": delta,
},
)
return row.used_bytes
```
**Why `fetchone() is None` and not `result.rowcount == 0`:** With psycopg v3 and `RETURNING`, `rowcount` reflects the number of rows in the result set. However, the SQLAlchemy async layer behavior can vary across backends (psycopg, asyncpg). Using `fetchone()` returning `None` is the most portable and explicit check — it directly tests "did the UPDATE match any rows?" [VERIFIED: psycopg v3 docs confirm `rowcount` reflects RETURNING result set count; fetchone() is unambiguous]
**Delete decrement (STORE-06):**
```python
await session.execute(
text("""
UPDATE quotas
SET used_bytes = GREATEST(0, used_bytes - :delta)
WHERE user_id = :uid
"""),
{"delta": doc.size_bytes, "uid": str(doc.user_id)},
)
```
`GREATEST(0, ...)` prevents underflow if `size_bytes` was somehow recorded incorrectly. No RETURNING needed — the update always succeeds.
**Important:** The atomic quota check uses `doc.size_bytes` from `stat_object()` at confirm time, NOT client-supplied size. This matches D-07.
### Finding 5: MinIO `stat_object()` API
[VERIFIED: `python3 -c "from minio import Minio; help(Minio.stat_object)"`]
```python
result = client.stat_object(bucket_name, object_name)
# result.size — int, file size in bytes (authoritative)
# result.last_modified — datetime
# result.etag — str
# result.content_type — str
```
**Timing:** `stat_object()` is called at the `/confirm` step, after the browser has completed its PUT to MinIO. There is no meaningful eventual-consistency concern here since both the presigned PUT and the stat call go to the same MinIO instance. The presigned PUT completes before the browser calls `/confirm`, so the object exists by the time `stat_object()` is called.
**Error handling:** `stat_object()` raises `minio.error.S3Error` with `code="NoSuchKey"` if the object does not exist (upload never happened or TTL exceeded). Catch this and return 422: "Upload not found — presigned URL may have expired."
**Async wrapping:** `await asyncio.to_thread(self._client.stat_object, bucket, key)` — standard pattern.
### Finding 6: Topics Isolation — Index Strategy and Query Pattern
[VERIFIED: codebase inspection of Topic model + SQLAlchemy query patterns]
The `Topic` model already has `user_id` as a nullable UUID FK. The existing `UniqueConstraint("user_id", "name", name="uq_topics_user_name")` correctly handles the layered namespace: a system topic named "Finance" (`user_id=NULL`) and a user topic named "Finance" (`user_id=some_uuid`) can coexist.
**Index required:** The existing codebase has no dedicated index on `topics.user_id`. The phase needs to add one in the migration for the `user_id IS NULL OR user_id = :uid` query pattern used at classification time and topic listing:
```python
op.create_index("ix_topics_user_id", "topics", ["user_id"])
```
**Query pattern (D-17):**
```python
from sqlalchemy import or_, is_
stmt = select(Topic).where(
or_(Topic.user_id == current_user.id, Topic.user_id.is_(None))
).order_by(Topic.name)
```
**Topic creation for new AI suggestions (D-11):** When the classifier suggests a new topic, `create_topic()` in `services/storage.py` must accept an optional `user_id` parameter (defaulting to `None` for backward compatibility). Phase 3 calls it with `user_id=doc.user_id` to put suggested topics into the user's namespace.
**The deduplication query in `create_topic()` must also be scoped:**
```python
# Current (wrong after Phase 3): matches any topic with same name regardless of user
select(Topic).where(func.lower(Topic.name) == name.lower())
# Correct (Phase 3): deduplication within the same namespace only
select(Topic).where(
func.lower(Topic.name) == name.lower(),
Topic.user_id == user_id # None for system topics, user UUID for per-user
)
```
### Finding 7: Auth Guard Pattern — 404 vs 403, Admin 403
[VERIFIED: codebase inspection of `deps/auth.py` + CONTEXT.md D-16]
`get_current_user` and `get_current_admin` dependencies are fully implemented in `backend/deps/auth.py`. Adding them to document/topic endpoints is a mechanical dependency injection.
**Ownership assertion (SEC-04):**
```python
@router.get("/{doc_id}")
async def get_document(
doc_id: str,
current_user: User = Depends(get_current_user),
session: AsyncSession = Depends(get_db),
):
doc = await session.get(Document, uuid.UUID(doc_id))
if doc is None or doc.user_id != current_user.id:
raise HTTPException(404, "Document not found")
return _doc_to_dict(doc, ...)
```
Returning 404 for both "not found" and "wrong owner" avoids information leakage (D-16). An attacker cannot distinguish between "document doesn't exist" and "document exists but belongs to someone else."
**Admin 403 on `/api/documents/*`:** Admins should not access document content (per CLAUDE.md and Phase 2 D-07 deferred rule). The simplest implementation is a separate dependency or guard at the router level:
```python
# Option A: inline check in each handler
if current_user.role == "admin":
raise HTTPException(403, "Admin accounts cannot access document content")
# Option B: a dedicated dependency
async def get_regular_user(user: User = Depends(get_current_user)) -> User:
if user.role == "admin":
raise HTTPException(403, "Admin accounts cannot access document content")
return user
```
Option B is cleaner and prevents forgetting the check in any one handler.
**Admin topics endpoint:** `POST /api/admin/topics` uses `get_current_admin` (already available). Regular `/api/topics/*` uses `get_regular_user` (or `get_current_user` with write operations scoped to own namespace).
### Finding 8: Per-User AI Classification — Celery Task Refactor
[VERIFIED: codebase inspection of `tasks/document_tasks.py` + `services/classifier.py`]
**Current flow (broken for Phase 3):**
`classifier.classify_document()` calls `storage.load_settings()` which reads the flat-file `settings.json`. This gives global AI config. Phase 3 eliminates this.
**Target flow (D-14, DOC-03, DOC-05):**
```python
# In _run() inside document_tasks.py (after looking up doc):
user = await session.get(User, doc.user_id)
ai_provider = user.ai_provider or settings.default_ai_provider
ai_model = user.ai_model or settings.default_ai_model
topics = await classifier.classify_document(
session, document_id,
ai_provider=ai_provider,
ai_model=ai_model,
)
```
**`classifier.py` signature change:**
```python
async def classify_document(
session: AsyncSession,
doc_id: str,
topic_names: list[str] | None = None,
ai_provider: str | None = None, # NEW
ai_model: str | None = None, # NEW
) -> list[str]:
# ... uses ai_provider/ai_model to call get_provider() instead of load_settings()
```
The `get_provider()` factory in `backend/ai/__init__.py` currently takes a `settings` dict. It needs to accept `(provider_name, model)` or be called with a minimal settings dict constructed from the parameters. Inspect the factory signature before planning the refactor.
**`config.py` additions (D-13, D-15):**
```python
system_prompt: str = "" # SYSTEM_PROMPT env var
default_ai_provider: str = "ollama" # DEFAULT_AI_PROVIDER env var
default_ai_model: str = "llama3.2" # DEFAULT_AI_MODEL env var
```
### Finding 9: `GET /api/me/quota` Endpoint + Quota Bar Frontend
[VERIFIED: codebase inspection + CONTEXT.md specifics]
**New endpoint needed:** The CONTEXT.md specifies `GET /api/me/quota` returning `{used_bytes, limit_bytes}`. This endpoint belongs in `api/auth.py` (where `/api/auth/me` already lives) or a new `api/me.py` router. Adding it to `api/auth.py` is simpler:
```python
@router.get("/me/quota")
async def get_my_quota(
current_user: User = Depends(get_current_user),
session: AsyncSession = Depends(get_db),
):
q = await session.get(Quota, current_user.id)
return {"used_bytes": q.used_bytes, "limit_bytes": q.limit_bytes}
```
**Frontend quota bar (STORE-04):** `AppSidebar.vue` already uses `useAuthStore`. A new `useQuotaStore` (or inline reactive data in `AppSidebar`) fetches `GET /api/me/quota` on mount and after each upload completion. The bar logic:
```
pct = used_bytes / limit_bytes * 100
color = pct >= 95 ? 'red' : pct >= 80 ? 'amber' : 'gray'
label = `${(used_bytes / 1048576).toFixed(1)} MB of ${(limit_bytes / 1048576).toFixed(0)} MB`
```
**Upload progress (STORE-04 context):** The `UploadProgress.vue` component currently shows a spinner + "Uploading…" text. To show real progress during the MinIO PUT step, XMLHttpRequest is required — `fetch` does not expose upload progress events. The existing `api/client.js` uses `fetch`; the new upload flow needs a separate `uploadToMinIO(url, file, onProgress)` helper using XHR:
```javascript
function uploadToMinIO(url, file, onProgress) {
return new Promise((resolve, reject) => {
const xhr = new XMLHttpRequest()
xhr.upload.addEventListener('progress', e => {
if (e.lengthComputable) onProgress(Math.round(e.loaded / e.total * 100))
})
xhr.addEventListener('load', () => xhr.status < 400 ? resolve() : reject(new Error(`PUT failed: ${xhr.status}`)))
xhr.addEventListener('error', () => reject(new Error('Network error during upload')))
xhr.open('PUT', url)
xhr.setRequestHeader('Content-Type', file.type || 'application/octet-stream')
xhr.send(file)
})
}
```
### Finding 10: Celery Beat Abandoned Upload Cleanup (D-06)
[VERIFIED: `python3 -c "import celery; from datetime import timedelta; ..."` + codebase inspection]
**Celery beat schedule syntax (Celery 5.6.3):**
```python
from datetime import timedelta
celery_app.conf.beat_schedule = {
"cleanup-abandoned-uploads": {
"task": "tasks.document_tasks.cleanup_abandoned_uploads",
"schedule": timedelta(minutes=30),
}
}
```
Add to `celery_app.py`. The existing `celery-worker` service in `docker-compose.yml` only runs `celery -A celery_app worker`. Beat requires a separate process: `celery -A celery_app beat --loglevel=info`. Add a `celery-beat` service to `docker-compose.yml`.
**Task implementation:**
```python
@celery_app.task(name="tasks.document_tasks.cleanup_abandoned_uploads")
def cleanup_abandoned_uploads() -> dict:
return asyncio.run(_cleanup_abandoned())
async def _cleanup_abandoned() -> dict:
from datetime import datetime, timezone, timedelta
from sqlalchemy import select, delete
from db.session import AsyncSessionLocal
from db.models import Document
from storage import get_storage_backend
cutoff = datetime.now(timezone.utc) - timedelta(hours=1)
async with AsyncSessionLocal() as session:
result = await session.execute(
select(Document).where(
Document.status == "pending",
Document.created_at < cutoff,
)
)
docs = result.scalars().all()
backend = get_storage_backend()
cleaned = 0
for doc in docs:
try:
if doc.object_key:
await backend.delete_object(doc.object_key)
except Exception:
pass # MinIO object may not exist yet — safe to ignore
await session.delete(doc)
cleaned += 1
await session.commit()
return {"cleaned": cleaned}
```
**Idempotency:** Re-running the task while some cleanup is in progress is safe — the `WHERE status='pending' AND created_at < cutoff` filter is idempotent. Partially cleaned rows (MinIO gone, DB row not yet deleted) are handled by ignoring MinIO deletion errors.
**Beat routing:** Add beat task to the `documents` queue or a separate `beat` queue. The existing `task_routes` in `celery_app.py` covers `tasks.document_tasks.*` → `documents` queue, so the cleanup task will route there automatically.
### Finding 11: Settings Flat-File Retirement Scope
[VERIFIED: codebase inspection of all affected files]
**Files to modify:**
- `backend/api/settings.py` — delete entire file (router removed from `main.py`)
- `backend/services/storage.py` — remove `load_settings()`, `save_settings()`, `mask_api_key()`, `settings_masked()`, `DEFAULT_SETTINGS` import, `SETTINGS_FILE` import
- `backend/config.py` — remove `SETTINGS_FILE`, `DEFAULT_SETTINGS`, `DEFAULT_SYSTEM_PROMPT`; add `SYSTEM_PROMPT`, `DEFAULT_AI_PROVIDER`, `DEFAULT_AI_MODEL`
- `backend/services/classifier.py` — remove `storage.load_settings()` and `get_provider(settings)` calls; accept `ai_provider`/`ai_model` parameters
- `backend/main.py` — remove `api/settings.py` router include
- `frontend/src/stores/settings.js` — remove (or gut to minimal placeholder)
- `frontend/src/views/SettingsView.vue` — replace with placeholder or remove route
- `frontend/src/api/client.js` — remove `getSettings()`, `patchSettings()`, `testProvider()`, `getDefaultPrompt()`
- `frontend/src/router/index.js` — remove `/settings` route (or keep as redirect to account)
- `frontend/src/components/layout/AppSidebar.vue` — remove Settings nav link
**SettingsView.vue fate:** The existing view lets users configure AI provider. Phase 3 removes user control of AI config (admin-only via admin panel). The `/settings` route should display a message: "AI configuration is managed by your admin." The SettingsView replacement should be minimal — a single message card, no form.
### Finding 12: `ai/__init__.py` — `get_provider()` Factory
[VERIFIED: codebase inspection of `backend/ai/__init__.py`]
The classifier refactor requires understanding the `get_provider()` factory. Let me check its current signature:
The factory currently takes a `settings` dict (legacy format). For Phase 3, the classifier will have `ai_provider` (string like `"ollama"`, `"anthropic"`) and `ai_model` (string). The factory call should become:
```python
# Construct a minimal settings-like dict from parameters
_settings = {
"active_provider": ai_provider,
"providers": {
ai_provider: {"model": ai_model}
# For API-key providers, keys come from env vars
}
}
provider = get_provider(_settings)
```
Or refactor `get_provider()` to accept `(provider_name: str, model: str)` directly — cleaner but requires updating the factory signature. The planner should choose: minimal change (dict construction) vs proper refactor (new signature). Minimal change is safer for Phase 3 scope.
---
## Implementation Risks
### Risk 1: MinIO CORS — Browser PUT May Silently Fail
**Severity:** HIGH — blocks the entire presigned upload flow from working in the browser
**What goes wrong:** If `MINIO_API_CORS_ALLOW_ORIGIN` is not set in the MinIO container environment, the browser's OPTIONS preflight for the PUT request will receive no `Access-Control-Allow-Origin` header and abort. The JavaScript error is `ERR_FAILED` or a CORS error with no useful detail.
**How to prevent:**
1. Add `MINIO_API_CORS_ALLOW_ORIGIN: "${CORS_ORIGINS}"` (or `http://localhost:5173`) to the MinIO service in `docker-compose.yml`.
2. Add `MINIO_PUBLIC_ENDPOINT` env var and dual MinIO client (Finding 3) so presigned URLs use `localhost:9000` not `minio:9000`.
3. Integration test must exercise the full browser path (not just the FastAPI endpoints).
**Early detection:** Create a minimal curl-based test: `curl -X OPTIONS http://localhost:9000/docuvault/test-key -H "Origin: http://localhost:5173" -H "Access-Control-Request-Method: PUT"` — should return `Access-Control-Allow-Origin`.
### Risk 2: Concurrent Upload Race Condition (SC2)
**Severity:** HIGH — Phase 3 SC2 requires exactly one success and one 413 when two uploads exceed quota
**What goes wrong:** Two concurrent `/confirm` calls both see `used_bytes=90MB`, both calculate `90+15=105 > 100` as still under limit, both succeed — quota goes to 120MB.
**Why this cannot happen with the atomic pattern:** The `UPDATE quotas SET used_bytes = used_bytes + :delta WHERE (used_bytes + :delta) <= limit_bytes` is atomic at the PostgreSQL row level. PostgreSQL's row-level locking ensures the second `UPDATE` sees the result of the first. The WHERE clause re-evaluates against the committed state.
**Verification:** The SC2 test must fire two concurrent requests using `asyncio.gather()` and assert exactly one 200 and one 413.
### Risk 3: Alembic Migration with MinIO Connection
**Severity:** MEDIUM — migration may fail if MinIO is unreachable at migration time
**What goes wrong:** If `alembic upgrade head` is run while MinIO is not yet healthy (e.g., early in `docker compose up`), the migration's MinIO object deletion step will fail, aborting the migration.
**How to prevent:** The migration reads MinIO connection details from environment variables (`MINIO_ENDPOINT`, etc.). These are available in the `backend` container. `docker-compose.yml` already has `minio: condition: service_healthy` in `backend` `depends_on` — migrations should be run after services are healthy. Document that migrations must run after `docker compose up` completes health checks.
**Rollback safety:** If the migration fails mid-way through MinIO deletions but before the `ALTER COLUMN`, the DB is in a consistent state (nullable user_id still allowed). Re-running `alembic upgrade head` is idempotent at the schema level (Alembic tracks applied revisions), but the MinIO cleanup step will re-attempt on partial data. This is safe — `remove_object` on an already-deleted key returns silently.
### Risk 4: `stat_object()` Size vs. Actual Quota Needs
**Severity:** LOW — edge case with content-type negotiation
**What goes wrong:** The file size from `stat_object().size` is the raw byte count of the uploaded object. This matches what was PUT. No edge cases expected — the presigned URL is for a single-part PUT, not multipart.
**Resolved:** Simple, no special handling needed.
### Risk 5: Topics Deduplication Cross-Namespace
**Severity:** MEDIUM — subtle bug if not addressed
**What goes wrong (without fix):** If `create_topic()` is called with name "Finance" and `user_id=None` (system topic), and later called with name "Finance" and `user_id=some_uuid`, the current deduplication query `WHERE lower(name) = 'finance'` would return the system topic and skip creating a per-user topic. The user's document would then be tagged to the system topic (`user_id=NULL`), not their personal topic.
**How to prevent:** Scope the deduplication query by `user_id` as described in Finding 6.
### Risk 6: SettingsView Navigation After Retirement
**Severity:** LOW — UX regression if not handled
**What goes wrong:** Users who navigate to `/settings` (or have it bookmarked) will see a broken page or 404 after `SettingsView.vue` is removed.
**How to prevent:** Keep the `/settings` route but replace `SettingsView.vue` content with a simple "AI configuration is managed by your admin" message. Remove the form, API calls, and store interactions.
---
## Validation Architecture
### Test Framework
| Property | Value |
|----------|-------|
| Framework | pytest + pytest-asyncio (existing in codebase) |
| Config | `backend/pytest.ini` or `backend/pyproject.toml` (check before Wave 0) |
| Quick run command | `cd backend && pytest tests/test_documents.py tests/test_topics.py -x -q` |
| Full suite command | `cd backend && pytest -v` |
### Phase Requirements → Test Map
| Req ID | Behavior | Test Type | Automated Command | File Exists? |
|--------|----------|-----------|-------------------|-------------|
| STORE-03 | Atomic quota enforced at confirm; no double-spend | unit + integration | `pytest tests/test_quota.py -x` | No — Wave 0 |
| STORE-03 (SC2) | Two concurrent uploads at limit → exactly one 413 | integration | `pytest tests/test_quota.py::test_concurrent_quota_race -x` | No — Wave 0 |
| STORE-04 | `GET /api/me/quota` returns `{used_bytes, limit_bytes}` | unit | `pytest tests/test_documents.py::test_get_quota -x` | No — Wave 0 |
| STORE-05 | Confirm endpoint returns 413 with `{used_bytes, limit_bytes, rejected_bytes}` | unit | `pytest tests/test_quota.py::test_quota_exceeded_response -x` | No — Wave 0 |
| STORE-06 | Delete decrements quota atomically | unit | `pytest tests/test_quota.py::test_delete_decrements_quota -x` | No — Wave 0 |
| SEC-04 | Cross-user document access returns 404 | unit | `pytest tests/test_documents.py::test_cross_user_access_404 -x` | No — Wave 0 |
| SEC-04 | Admin JWT on document endpoint returns 403 | unit | `pytest tests/test_documents.py::test_admin_cannot_access_documents -x` | No — Wave 0 |
| DOC-03 | AI classification uses user's assigned provider not global config | unit | `pytest tests/test_classifier.py::test_per_user_provider -x` | No — Wave 0 |
| DOC-04 | Topic list returns system topics + user topics (not other users') | unit | `pytest tests/test_topics.py::test_topic_namespace -x` | No — Wave 0 |
| DOC-05 | Celery task resolves provider from document owner's DB record | unit | `pytest tests/test_classifier.py::test_celery_task_uses_user_provider -x` | No — Wave 0 |
| D-02 | Alembic migration 0003 applies cleanly, null-user docs deleted | integration | `pytest tests/test_alembic.py::test_migration_0003 -x` | No — Wave 0 |
| D-05 | Upload-url endpoint creates pending Document row + returns presigned URL | unit | `pytest tests/test_documents.py::test_upload_url_endpoint -x` | No — Wave 0 |
| D-05 | Confirm endpoint calls stat_object, updates size, sets status=uploaded | unit | `pytest tests/test_documents.py::test_confirm_endpoint -x` | No — Wave 0 |
| D-12 | `/api/settings` returns 404 (removed) | unit | `pytest tests/test_settings.py::test_settings_endpoint_removed -x` | No — Wave 0 |
### Sampling Rate
- **Per task commit:** `cd backend && pytest tests/test_documents.py tests/test_quota.py tests/test_topics.py -x -q`
- **Per wave merge:** `cd backend && pytest -v`
- **Phase gate:** Full suite green before `/gsd:verify-work`
### Wave 0 Gaps
- `tests/test_quota.py` — covers STORE-03, STORE-05, STORE-06, SC2 race condition
- Test fixture `auth_user` — authenticated user with quota row (needed across quota and document tests)
- Test fixture `admin_user` — admin-role user (needed for admin 403 tests)
- Mock for MinIO `presigned_put_object` and `stat_object` in unit tests (document tests must not require live MinIO)
The existing `conftest.py` already provides `db_session` (in-memory SQLite) and `async_client`. The new fixtures extend this.
---
## Recommended Approach
### Wave Structure
**Wave 1: Migration + Quota Foundation (no external deps required)**
1. Alembic migration `0003_multi_user_isolation.py`:
- Delete null-user Document rows (collect object_keys first, then DB delete, then MinIO delete)
- Delete all Topic rows
- Add `NOT NULL` to `documents.user_id`
- Reconcile quotas
- Add index `ix_topics_user_id`
2. `MinIOBackend`: add `presigned_put_url()` method (dual client for public endpoint)
3. `StorageBackend` ABC: add `presigned_put_url()` abstract method
4. `config.py`: add `MINIO_PUBLIC_ENDPOINT`, `SYSTEM_PROMPT`, `DEFAULT_AI_PROVIDER`, `DEFAULT_AI_MODEL`
5. `docker-compose.yml`: add `MINIO_API_CORS_ALLOW_ORIGIN`, `MINIO_PUBLIC_ENDPOINT` to MinIO service; add `celery-beat` service
**Wave 2: Backend API (blocked on Wave 1)**
1. `api/documents.py`: replace `/upload` with `/upload-url` + `/{id}/confirm`; add `get_current_user` (or `get_regular_user`) to all handlers; add ownership assertions (D-16)
2. `api/topics.py`: add `get_current_user`; scope queries by `user_id IN (current_user.id, NULL)`; add `POST /api/admin/topics` endpoint
3. `api/auth.py`: add `GET /api/me/quota` endpoint
4. `api/settings.py`: delete file; remove from `main.py`
5. `services/storage.py`: remove settings functions; update `create_topic()` to accept `user_id`; update `delete_document()` to decrement quota
6. `services/classifier.py`: accept `ai_provider`/`ai_model` parameters; remove `load_settings()` call
**Wave 3: Celery (blocked on Wave 2)**
1. `tasks/document_tasks.py`: update `_run()` to look up `user.ai_provider`/`user.ai_model`; pass to classifier
2. `tasks/document_tasks.py`: add `cleanup_abandoned_uploads` task
3. `celery_app.py`: add `beat_schedule` configuration
**Wave 4: Frontend (blocked on Wave 2)**
1. `api/client.js`: add `getUploadUrl()`, `confirmUpload()`, `getMyQuota()`; remove settings functions
2. `stores/documents.js`: update `upload()` to three-step flow with XHR progress
3. `components/upload/UploadProgress.vue`: add progress percentage display
4. `components/layout/AppSidebar.vue`: add quota bar
5. `views/SettingsView.vue`: replace with "managed by admin" placeholder
6. `stores/settings.js`: remove or gut
### Architecture Diagram
```
Browser
│
├─ POST /api/documents/upload-url ──► FastAPI ──► DB: INSERT Document(status=pending)
│ └──► MinIO: presigned_put_object()
│ └──► returns {upload_url, document_id}
│
├─ PUT {upload_url} ──────────────────────────────────────────────────────► MinIO (direct)
│ (XHR with progress events)
│
└─ POST /api/documents/{id}/confirm ─► FastAPI ──► MinIO: stat_object() → size
└──► DB: atomic quota UPDATE
└──► DB: Document.status = 'uploaded'
└──► Celery: extract_and_classify.delay()
└──► returns {document_id, used_bytes}
Celery Worker
└─ extract_and_classify(document_id)
├─ DB: SELECT user.ai_provider, user.ai_model WHERE user.id = doc.user_id
├─ MinIO: get_object(object_key)
├─ extractor.extract_text()
├─ classifier.classify_document(ai_provider, ai_model)
│ └─ DB: SELECT topics WHERE user_id IS NULL OR user_id = doc.user_id
└─ DB: UPDATE document SET status='classified', extracted_text=...
Celery Beat (every 30 min)
└─ cleanup_abandoned_uploads()
├─ DB: SELECT documents WHERE status='pending' AND created_at < now()-1h
├─ MinIO: remove_object() for each
└─ DB: DELETE documents
```
---
## Project Constraints (from CLAUDE.md)
| Directive | Impact on Phase 3 |
|-----------|-------------------|
| JWT access token in Pinia memory only | `uploadDocument()` flow in Vue must use `useAuthStore().accessToken` for the `/upload-url` and `/confirm` API calls (via existing `request()` helper); XHR for MinIO PUT does NOT send auth header (MinIO presigned URL is self-authenticating) |
| MinIO object keys UUID-based `{user_id}/{document_id}/{uuid4()}{ext}` | Key generated in `upload-url` handler, stored in `Document.object_key`, used unchanged at confirm and cleanup |
| Atomic quota UPDATE pattern | Exact SQL from CLAUDE.md: `UPDATE quotas SET used_bytes = used_bytes + $delta WHERE (used_bytes + $delta) <= limit_bytes RETURNING used_bytes` |
| Admin endpoints never return document content or `credentials_enc` | Admin topics endpoint at `POST /api/admin/topics` returns topic metadata only; admin cannot call document content endpoints (SEC-04 + D-16) |
| Every document/folder endpoint asserts `resource.user_id == current_user.id` | Ownership assertion in every handler — return 404 for cross-user access |
| All DB queries via ORM / parameterized statements | `text()` with bound parameters is used for the atomic quota SQL — always pass `:delta` and `:uid` as parameters, never interpolate |
| SEC-04 | Object keys are stored in DB at upload-url time; confirm endpoint fetches key from DB (`Document.object_key`) — never reconstructs from request params |
---
## Environment Availability
| Dependency | Required By | Available | Version | Fallback |
|------------|------------|-----------|---------|----------|
| MinIO SDK (`minio`) | Presigned PUT, stat_object | Yes | 7.2.20 | — |
| Celery + Redis | Beat schedule, cleanup task | Yes | 5.6.3 | — |
| SQLAlchemy `text()` | Atomic quota SQL | Yes | 2.0+ | — |
| psycopg v3 | Async DB driver | Yes (3.2.13) | 3.2.13 | — |
| `aiosqlite` | Test in-memory DB | Yes (0.20.0) | 0.20.0 | — |
| `asyncio.to_thread()` | Sync MinIO SDK wrapping | Yes (Python 3.10+) | Python 3.12 | — |
No blocking missing dependencies.
---
## Assumptions Log
| # | Claim | Section | Risk if Wrong |
|---|-------|---------|---------------|
| A1 | `MINIO_API_CORS_ALLOW_ORIGIN` env var works reliably on current MinIO docker image for browser PUT CORS | Finding 3 | Browser PUT blocked by CORS; fallback is `mc admin config set` in `docker-compose.yml` entrypoint or nginx CORS proxy |
| A2 | PostgreSQL row-level locking on the quota `UPDATE WHERE` is sufficient to prevent concurrent over-quota uploads | Finding 4 / Risk 2 | Two simultaneous confirms could both succeed — would require `SELECT FOR UPDATE` on quota row or advisory lock |
| A3 | `MinIOBackend.stat_object()` sees the completed PUT immediately with no eventual-consistency delay | Finding 5 | `NoSuchKey` error on stat_object even though PUT succeeded — add retry with exponential backoff as fallback |
| A4 | `ai/__init__.py` `get_provider()` factory accepts a dict with `active_provider` and `providers` keys | Finding 12 | Factory may have a different internal signature — inspect before refactoring classifier |
**Note on A2:** PostgreSQL's `UPDATE ... WHERE` with a conditional is a standard serializable approach for quota enforcement. PostgreSQL's statement-level locking ensures the second concurrent UPDATE sees the committed result of the first. This is the same pattern used by many SaaS systems for rate limiting and quota enforcement. Risk is LOW given the choice of PostgreSQL (which has strong MVCC guarantees), but the SC2 integration test must verify this explicitly.
---
## Sources
### Primary (HIGH confidence)
- MinIO Python SDK v7.2.20 — verified via `help(Minio.presigned_put_object)`, `help(Minio.stat_object)`, `dir(Minio)` — local installation
- SQLAlchemy `text()` and `CursorResult` — verified via Python inspection of installed package
- Celery 5.6.3 `beat_schedule` syntax — verified via `python3 -c "import celery; ..."`
- psycopg v3.2.13 — verified `rowcount` + `fetchone()` behavior
- Codebase inspection: `backend/storage/minio_backend.py`, `backend/db/models.py`, `backend/services/storage.py`, `backend/tasks/document_tasks.py`, `backend/services/classifier.py`, `backend/deps/auth.py`, `backend/api/documents.py`, `backend/api/topics.py`, `frontend/src/api/client.js`, `frontend/src/stores/documents.js`
- CONTEXT.md — all locked decisions (D-01 through D-17)
- CLAUDE.md — architectural constraints
### Secondary (MEDIUM confidence)
- [MinIO GitHub discussion #20555](https://github.com/minio/minio/discussions/20555) — per-bucket CORS is enterprise-only in MinIO open source
- [MinIO GitHub issue #16479](https://github.com/minio/minio/issues/16479) — `mc admin config set` as authoritative CORS configuration method
- MinIO CORS `MINIO_API_CORS_ALLOW_ORIGIN` env var — referenced in multiple GitHub issues and community reports; enterprise docs confirm it exists at the global level
- [Medium article on Docker hostname resolution](https://medium.com/@codyalexanderraymond/solving-presigned-url-issues-in-dockerized-development-with-minio-internal-dns-61a8b7c7c0ce) — dual-client / nginx proxy approaches for presigned URL Docker hostname issue
### Tertiary (LOW confidence)
- XMLHttpRequest upload progress events vs. fetch — referenced in multiple blog posts; treated as LOW confidence due to unverified sources, but cross-confirmed by MDN documentation (fetch does not expose upload progress)
---
## Metadata
**Confidence breakdown:**
- Alembic migration pattern: HIGH — established in migrations 0001 and 0002 in this codebase
- MinIO SDK API (`presigned_put_object`, `stat_object`): HIGH — verified against installed SDK
- Atomic quota SQL: HIGH — verified with SQLAlchemy text() and psycopg v3
- MinIO CORS open-source solution: MEDIUM — `MINIO_API_CORS_ALLOW_ORIGIN` is documented but has had reliability issues; `mc admin config set` is more reliable but harder to automate in compose
- Docker presigned URL hostname issue: HIGH — well-documented, dual-client solution is clean
- Auth guard patterns: HIGH — `get_current_user`/`get_current_admin` already in production
**Research date:** 2026-05-23
**Valid until:** 2026-06-22 (30 days — stable libraries)