curo/kite

Files

T

curo1305 fdc32d431d docs(03): create Phase 3 execution plan — document migration & multi-user isolation

5 plans across 5 sequential waves covering: Alembic migration 0003 (null-user
cleanup, NOT NULL constraint, quota reconciliation), presigned MinIO PUT upload
flow with atomic quota enforcement, auth guards on all document/topic endpoints,
flat-file settings retirement + per-user AI classification, and frontend quota bar
with 3-step XHR upload progress.

Verification passed across all 12 dimensions. All 8 phase requirements covered
(STORE-03/04/05/06, SEC-04, DOC-03/04/05).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-23 13:36:28 +02:00

46 KiB

Raw Blame History

Phase 3: Document Migration & Multi-User Isolation — Research

Researched: 2026-05-23 Domain: FastAPI + SQLAlchemy async + MinIO presigned PUT + Celery beat + Vue 3 XHR upload Confidence: HIGH

<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

Null-User Record Cleanup

D-01: All documents with user_id=NULL are deleted (DB rows + MinIO objects) before NOT NULL is added. Zero production data loss (dev/test data only).
D-02: Cleanup baked into Alembic upgrade() in 0003_multi_user_isolation.py.
D-03: After null-user cleanup, reconcile quota: UPDATE quotas SET used_bytes = (SELECT COALESCE(SUM(size_bytes), 0) FROM documents WHERE documents.user_id = quotas.user_id).

Presigned Upload Flow

D-04: Direct-to-MinIO presigned PUT uploads. Existing multipart POST-to-FastAPI endpoint is replaced.
D-05: Three-step flow: (1) POST /api/documents/upload-url → creates Document row (status='pending'), generates presigned URL, returns {upload_url, document_id}. Quota NOT reserved here. (2) Browser PUTs bytes to MinIO. (3) POST /api/documents/{id}/confirm → stat_object() for authoritative size, atomic quota UPDATE, status='uploaded', enqueue Celery task.
D-06: Abandoned upload cleanup: Celery beat task every 30 minutes deletes Document rows older than 1 hour with status='pending' and their MinIO objects.
D-07: Atomic quota at confirm step. UPDATE quotas SET used_bytes = used_bytes + $delta WHERE (used_bytes + $delta) <= limit_bytes RETURNING used_bytes. 413 response if no rows returned. Delete decrements atomically with GREATEST(0, used_bytes - $delta).

Topics Isolation Model

D-08: System topics (user_id=NULL) visible to all; per-user topics (user_id=current_user.id) visible only to that user.
D-09: Only admin can CRUD system topics via POST /api/admin/topics. Regular users CRUD own topics via /api/topics/*.
D-10: All existing topics deleted in Phase 3 migration. Admin seeds system topics fresh post-Phase 3.
D-11: AI classification receives system + user topics; new AI-suggested topics go into user namespace.

Settings Flat-File Retirement

D-12: /api/settings endpoint removed. load_settings(), save_settings(), settings.json deleted.
D-13: System prompt moves to SYSTEM_PROMPT env var in config.py. Hardcoded fallback in classifier.py.
D-14: Celery task resolves AI config via doc.user_id → users.ai_provider + users.ai_model (second DB lookup in same session).
D-15: Fallback: DEFAULT_AI_PROVIDER + DEFAULT_AI_MODEL env vars (code defaults: "ollama" / "llama3.2").

Auth Guards

D-16: All /api/documents/* get get_current_user. Ownership assertion returns 404 (not 403) for cross-user access. Admin role returns 403 on all document endpoints.
D-17: /api/topics/* gets get_current_user. Topic queries filter by user_id IN (current_user.id, NULL).

Claude's Discretion

None specified beyond decisions above.

Deferred Ideas (OUT OF SCOPE)

Presigned GET URLs for document downloads (Phase 4 — DOC-02 PDF preview proxied through app)
Per-user system prompt overrides
Quota reservation at upload-url initiation with client-supplied size
MinIO event notification webhook approach </user_constraints>

<phase_requirements>

Phase Requirements

ID	Description	Research Support
STORE-03	Atomic quota enforcement at upload using UPDATE...WHERE...RETURNING pattern	Finding 4: verified pattern with psycopg v3 / SQLAlchemy `text()` + `fetchone()` null check
STORE-04	Quota usage bar in sidebar (X MB of Y MB) amber at 80%, red at 95%	Finding 9: `GET /api/me/quota` endpoint; Vue reactivity via `useAuthStore`
STORE-05	Upload rejection at quota limit with usage/size/link error	Finding 4: 413 response shape `{used_bytes, limit_bytes, rejected_bytes}`
STORE-06	Atomic quota decrement on document delete	Finding 4: `GREATEST(0, used_bytes - $delta)` pattern verified
SEC-04	All file access resolved through DB lookup — no object key reconstruction from params	Finding 7: ownership assertion on every document handler using `doc.user_id == current_user.id`
DOC-03	AI provider/model from DB per user; user cannot change AI config	Finding 8: Celery task second DB lookup `doc.user_id → user.ai_provider`
DOC-04	System default topics + per-user topic overrides preserved	Finding 6: `user_id IS NULL OR user_id = :uid` filter; index on `topics.user_id`
DOC-05	AI classification uses user's assigned provider and model	Finding 8: classifier refactor to accept `ai_provider` + `ai_model` parameters
</phase_requirements>

Research Summary

Phase 3 has three technical challenge clusters: (1) the Alembic migration with side effects (MinIO object deletion, quota reconciliation), (2) the two-step presigned PUT upload flow with its Docker hostname pitfall, and (3) multi-user data isolation across documents, topics, auth guards, and AI config.

The codebase is in excellent shape for this phase. The get_current_user / get_current_admin FastAPI dependencies are production-ready. The Quota, Document, and Topic ORM models already have all necessary columns. The atomic quota SQL pattern is straightforward to implement with SQLAlchemy text() + session.execute(). The MinIO SDK (minio==7.2.20) has presigned_put_object() and stat_object() with clean APIs. The biggest planning risk is the Docker hostname issue for browser-accessible presigned URLs — the generated URL contains the internal Docker hostname (minio:9000) which the browser cannot resolve; a second MinIO client instance configured with the public hostname is required.

Primary recommendation: Implement the phase in four waves: (1) Alembic migration + quota foundation, (2) presigned upload flow backend + auth guards, (3) Celery beat + classifier refactor, (4) frontend upload flow + quota bar.

Architectural Responsibility Map

Capability	Primary Tier	Secondary Tier	Rationale
Quota enforcement	API / Backend	Database	Atomic UPDATE must be DB-side; API layer calls it and interprets result
Presigned URL generation	API / Backend	MinIO	FastAPI generates URL using SDK; browser uses URL to PUT directly to MinIO
File bytes transfer	MinIO (direct)	—	Bytes never pass through API layer (CLAUDE.md architectural rule)
Ownership assertion	API / Backend	Database	Every handler verifies `doc.user_id == current_user.id` before responding
Topic namespace	Database	API / Backend	DB filter `user_id IS NULL OR user_id = :uid`; API enforces writes into correct namespace
AI config resolution	Celery task	Database	Task does second DB lookup; no task signature change
Abandoned upload cleanup	Celery beat	MinIO + Database	Periodic task; both DB row and MinIO object must be cleaned
Quota bar display	Frontend (client)	API / Backend	`GET /api/me/quota` endpoint; Vue component polls/renders
Upload progress	Frontend (client)	—	XMLHttpRequest upload events; fetch API does not support upload progress

Key Findings

Finding 1: Alembic Migration with Side Effects — Safe Pattern

[VERIFIED: codebase inspection + SQLAlchemy docs]

The Phase 3 migration (0003_multi_user_isolation.py) must perform: (1) delete null-user documents + their MinIO objects, (2) delete all topics, (3) add NOT NULL constraint to documents.user_id, (4) reconcile quotas.used_bytes.

Critical constraint: Alembic upgrade() runs synchronously. MinIO SDK calls are also synchronous (the async wrappers in MinIOBackend use asyncio.to_thread() which requires a running event loop). Inside Alembic, there is no event loop by default. The safe pattern is to call the synchronous MinIO SDK directly — not through MinIOBackend, which is the async wrapper. asyncio.run() inside upgrade() also works but adds unnecessary complexity.

Established pattern in this codebase: Migrations use op.execute(text(...)) for data manipulation. For the MinIO side, inline synchronous SDK calls are appropriate:

# In upgrade() — synchronous MinIO SDK, not the async wrapper
from minio import Minio
client = Minio(endpoint, access_key, secret_key, secure=False)
for obj_key in keys_to_delete:
    client.remove_object(bucket, obj_key)

Rollback safety: The downgrade() function cannot restore deleted MinIO objects or recreate deleted rows — document this explicitly. The migration is intentionally one-way for the cleanup step (D-01). The NOT NULL constraint reversal is safe via op.alter_column('documents', 'user_id', nullable=True).

Transaction boundary: The DB operations (DELETE rows, ALTER COLUMN, UPDATE quotas) run inside the Alembic transaction. The MinIO deletions happen outside any DB transaction — if MinIO deletion partially fails, the DB transaction can still commit. This is intentional: leftover MinIO objects are orphaned but harmless; they will not appear in the UI because the DB rows are gone.

Privilege grants: The existing pattern (from migration 0001/0002) requires GRANT statements for any new tables. Migration 0003 creates no new tables — no additional grants needed.

Finding 2: MinIO `presigned_put_object()` API

[VERIFIED: python3 -c "from minio import Minio; help(Minio.presigned_put_object)"]

client.presigned_put_object(
    bucket_name: str,
    object_name: str,           # the pre-computed object key
    expires: timedelta = timedelta(days=7),  # use timedelta(minutes=15) per D-05
) -> str  # returns the presigned URL string

The object key must be pre-computed BEFORE calling this method, since the presigned URL is tied to that exact key. The Phase 3 flow creates the key in FastAPI at upload-url time: f"{user_id}/{document_id}/{uuid4()}{ext}". This key is stored in the Document row (object_key column) at status='pending' time, so stat_object() at confirm time uses the same key.

Async wrapping: asyncio.to_thread(self._client.presigned_put_object, bucket, key, timedelta(minutes=15)) — same pattern as existing put_object calls in MinIOBackend.

Finding 3: Docker Hostname Pitfall for Presigned URLs — CRITICAL

[VERIFIED: community documentation + MinIO GitHub discussions]

The problem: MinIOBackend is initialized with endpoint="minio:9000" (the Docker internal hostname). presigned_put_object() generates URLs using this hostname: http://minio:9000/docuvault/...?X-Amz-Signature=.... Browsers running on the host machine cannot resolve minio as a DNS name — they see a network error before the PUT even starts.

Why you cannot just replace the hostname: The hostname is part of the HMAC-SHA256 signature. Changing minio:9000 to localhost:9000 in the URL string invalidates the signature; MinIO returns SignatureDoesNotMatch.

The solution: dual MinIO client. MinIOBackend needs a second Minio client instance initialized with the public/browser-accessible endpoint:

class MinIOBackend(StorageBackend):
    def __init__(self, endpoint, access_key, secret_key, bucket, secure=False,
                 public_endpoint=None):
        self._bucket = bucket
        self._client = Minio(endpoint, access_key, secret_key, secure=secure)
        # Second client for presigned URL generation — uses browser-accessible hostname.
        # Falls back to internal client if not configured.
        self._public_client = Minio(
            public_endpoint or endpoint, access_key, secret_key, secure=secure
        )

presigned_put_object() uses self._public_client; all other operations (put_object, get_object, stat_object, delete_object) use self._client (internal).

Configuration: Add MINIO_PUBLIC_ENDPOINT env var to config.py (optional, defaults to MINIO_ENDPOINT). In .env.example: MINIO_PUBLIC_ENDPOINT=localhost:9000. For production where MinIO is behind a load balancer or nginx, set this to the public URL.

MinIO CORS for browser PUT: MinIO open-source supports a global CORS setting via MINIO_API_CORS_ALLOW_ORIGIN environment variable (or mc admin config set <alias> api cors_allow_origin=...). For development: MINIO_API_CORS_ALLOW_ORIGIN: "http://localhost:5173". Add to docker-compose.yml MinIO service environment block.

Per-bucket CORS configuration (S3 PutBucketCors API) is an enterprise-only feature in MinIO. The minio Python SDK 7.2.20 has no set_bucket_cors() method — confirmed by dir(Minio) inspection. The global env var approach is the correct solution for the open-source edition. [ASSUMED: the global MINIO_API_CORS_ALLOW_ORIGIN env var is reliable in current MinIO docker image — early reports (2021 issue #11258) showed it unreliable, but more recent reports (2024 issue #16479) indicate mc admin config set works; env var should be equivalent on fresh container start]

Finding 4: Atomic Quota Enforcement Pattern

[VERIFIED: SQLAlchemy docs + psycopg v3 docs + codebase pattern inspection]

Upload enforcement (STORE-03, STORE-05):

from sqlalchemy import text

async def enforce_quota(session: AsyncSession, user_id: uuid.UUID, delta: int) -> int:
    """Atomically increment quota. Returns new used_bytes, or raises HTTPException(413)."""
    result = await session.execute(
        text("""
            UPDATE quotas
            SET used_bytes = used_bytes + :delta
            WHERE user_id = :uid
              AND (used_bytes + :delta) <= limit_bytes
            RETURNING used_bytes, limit_bytes
        """),
        {"delta": delta, "uid": str(user_id)},
    )
    row = result.fetchone()
    if row is None:
        # Quota exceeded — fetch current state for the 413 body
        quota_row = await session.execute(
            text("SELECT used_bytes, limit_bytes FROM quotas WHERE user_id = :uid"),
            {"uid": str(user_id)},
        )
        q = quota_row.fetchone()
        raise HTTPException(
            status_code=413,
            detail={
                "used_bytes": q.used_bytes if q else 0,
                "limit_bytes": q.limit_bytes if q else 0,
                "rejected_bytes": delta,
            },
        )
    return row.used_bytes

Why fetchone() is None and not result.rowcount == 0: With psycopg v3 and RETURNING, rowcount reflects the number of rows in the result set. However, the SQLAlchemy async layer behavior can vary across backends (psycopg, asyncpg). Using fetchone() returning None is the most portable and explicit check — it directly tests "did the UPDATE match any rows?" [VERIFIED: psycopg v3 docs confirm rowcount reflects RETURNING result set count; fetchone() is unambiguous]

Delete decrement (STORE-06):

await session.execute(
    text("""
        UPDATE quotas
        SET used_bytes = GREATEST(0, used_bytes - :delta)
        WHERE user_id = :uid
    """),
    {"delta": doc.size_bytes, "uid": str(doc.user_id)},
)

GREATEST(0, ...) prevents underflow if size_bytes was somehow recorded incorrectly. No RETURNING needed — the update always succeeds.

Important: The atomic quota check uses doc.size_bytes from stat_object() at confirm time, NOT client-supplied size. This matches D-07.

Finding 5: MinIO `stat_object()` API

[VERIFIED: python3 -c "from minio import Minio; help(Minio.stat_object)"]

result = client.stat_object(bucket_name, object_name)
# result.size — int, file size in bytes (authoritative)
# result.last_modified — datetime
# result.etag — str
# result.content_type — str

Timing: stat_object() is called at the /confirm step, after the browser has completed its PUT to MinIO. There is no meaningful eventual-consistency concern here since both the presigned PUT and the stat call go to the same MinIO instance. The presigned PUT completes before the browser calls /confirm, so the object exists by the time stat_object() is called.

Error handling: stat_object() raises minio.error.S3Error with code="NoSuchKey" if the object does not exist (upload never happened or TTL exceeded). Catch this and return 422: "Upload not found — presigned URL may have expired."

Async wrapping: await asyncio.to_thread(self._client.stat_object, bucket, key) — standard pattern.

Finding 6: Topics Isolation — Index Strategy and Query Pattern

[VERIFIED: codebase inspection of Topic model + SQLAlchemy query patterns]

The Topic model already has user_id as a nullable UUID FK. The existing UniqueConstraint("user_id", "name", name="uq_topics_user_name") correctly handles the layered namespace: a system topic named "Finance" (user_id=NULL) and a user topic named "Finance" (user_id=some_uuid) can coexist.

Index required: The existing codebase has no dedicated index on topics.user_id. The phase needs to add one in the migration for the user_id IS NULL OR user_id = :uid query pattern used at classification time and topic listing:

op.create_index("ix_topics_user_id", "topics", ["user_id"])

Query pattern (D-17):

from sqlalchemy import or_, is_

stmt = select(Topic).where(
    or_(Topic.user_id == current_user.id, Topic.user_id.is_(None))
).order_by(Topic.name)

Topic creation for new AI suggestions (D-11): When the classifier suggests a new topic, create_topic() in services/storage.py must accept an optional user_id parameter (defaulting to None for backward compatibility). Phase 3 calls it with user_id=doc.user_id to put suggested topics into the user's namespace.

The deduplication query in create_topic() must also be scoped:

# Current (wrong after Phase 3): matches any topic with same name regardless of user
select(Topic).where(func.lower(Topic.name) == name.lower())

# Correct (Phase 3): deduplication within the same namespace only
select(Topic).where(
    func.lower(Topic.name) == name.lower(),
    Topic.user_id == user_id  # None for system topics, user UUID for per-user
)

Finding 7: Auth Guard Pattern — 404 vs 403, Admin 403

[VERIFIED: codebase inspection of deps/auth.py + CONTEXT.md D-16]

get_current_user and get_current_admin dependencies are fully implemented in backend/deps/auth.py. Adding them to document/topic endpoints is a mechanical dependency injection.

Ownership assertion (SEC-04):

@router.get("/{doc_id}")
async def get_document(
    doc_id: str,
    current_user: User = Depends(get_current_user),
    session: AsyncSession = Depends(get_db),
):
    doc = await session.get(Document, uuid.UUID(doc_id))
    if doc is None or doc.user_id != current_user.id:
        raise HTTPException(404, "Document not found")
    return _doc_to_dict(doc, ...)

Returning 404 for both "not found" and "wrong owner" avoids information leakage (D-16). An attacker cannot distinguish between "document doesn't exist" and "document exists but belongs to someone else."

Admin 403 on /api/documents/*: Admins should not access document content (per CLAUDE.md and Phase 2 D-07 deferred rule). The simplest implementation is a separate dependency or guard at the router level:

# Option A: inline check in each handler
if current_user.role == "admin":
    raise HTTPException(403, "Admin accounts cannot access document content")

# Option B: a dedicated dependency
async def get_regular_user(user: User = Depends(get_current_user)) -> User:
    if user.role == "admin":
        raise HTTPException(403, "Admin accounts cannot access document content")
    return user

Option B is cleaner and prevents forgetting the check in any one handler.

Admin topics endpoint: POST /api/admin/topics uses get_current_admin (already available). Regular /api/topics/* uses get_regular_user (or get_current_user with write operations scoped to own namespace).

Finding 8: Per-User AI Classification — Celery Task Refactor

[VERIFIED: codebase inspection of tasks/document_tasks.py + services/classifier.py]

Current flow (broken for Phase 3):

classifier.classify_document() calls storage.load_settings() which reads the flat-file settings.json. This gives global AI config. Phase 3 eliminates this.

Target flow (D-14, DOC-03, DOC-05):

# In _run() inside document_tasks.py (after looking up doc):
user = await session.get(User, doc.user_id)
ai_provider = user.ai_provider or settings.default_ai_provider
ai_model = user.ai_model or settings.default_ai_model

topics = await classifier.classify_document(
    session, document_id,
    ai_provider=ai_provider,
    ai_model=ai_model,
)

classifier.py signature change:

async def classify_document(
    session: AsyncSession,
    doc_id: str,
    topic_names: list[str] | None = None,
    ai_provider: str | None = None,   # NEW
    ai_model: str | None = None,      # NEW
) -> list[str]:
    # ... uses ai_provider/ai_model to call get_provider() instead of load_settings()

The get_provider() factory in backend/ai/__init__.py currently takes a settings dict. It needs to accept (provider_name, model) or be called with a minimal settings dict constructed from the parameters. Inspect the factory signature before planning the refactor.

config.py additions (D-13, D-15):

system_prompt: str = ""          # SYSTEM_PROMPT env var
default_ai_provider: str = "ollama"    # DEFAULT_AI_PROVIDER env var
default_ai_model: str = "llama3.2"    # DEFAULT_AI_MODEL env var

Finding 9: `GET /api/me/quota` Endpoint + Quota Bar Frontend

[VERIFIED: codebase inspection + CONTEXT.md specifics]

New endpoint needed: The CONTEXT.md specifies GET /api/me/quota returning {used_bytes, limit_bytes}. This endpoint belongs in api/auth.py (where /api/auth/me already lives) or a new api/me.py router. Adding it to api/auth.py is simpler:

@router.get("/me/quota")
async def get_my_quota(
    current_user: User = Depends(get_current_user),
    session: AsyncSession = Depends(get_db),
):
    q = await session.get(Quota, current_user.id)
    return {"used_bytes": q.used_bytes, "limit_bytes": q.limit_bytes}

Frontend quota bar (STORE-04): AppSidebar.vue already uses useAuthStore. A new useQuotaStore (or inline reactive data in AppSidebar) fetches GET /api/me/quota on mount and after each upload completion. The bar logic:

pct = used_bytes / limit_bytes * 100
color = pct >= 95 ? 'red' : pct >= 80 ? 'amber' : 'gray'
label = `${(used_bytes / 1048576).toFixed(1)} MB of ${(limit_bytes / 1048576).toFixed(0)} MB`

Upload progress (STORE-04 context): The UploadProgress.vue component currently shows a spinner + "Uploading…" text. To show real progress during the MinIO PUT step, XMLHttpRequest is required — fetch does not expose upload progress events. The existing api/client.js uses fetch; the new upload flow needs a separate uploadToMinIO(url, file, onProgress) helper using XHR:

function uploadToMinIO(url, file, onProgress) {
  return new Promise((resolve, reject) => {
    const xhr = new XMLHttpRequest()
    xhr.upload.addEventListener('progress', e => {
      if (e.lengthComputable) onProgress(Math.round(e.loaded / e.total * 100))
    })
    xhr.addEventListener('load', () => xhr.status < 400 ? resolve() : reject(new Error(`PUT failed: ${xhr.status}`)))
    xhr.addEventListener('error', () => reject(new Error('Network error during upload')))
    xhr.open('PUT', url)
    xhr.setRequestHeader('Content-Type', file.type || 'application/octet-stream')
    xhr.send(file)
  })
}

Finding 10: Celery Beat Abandoned Upload Cleanup (D-06)

[VERIFIED: python3 -c "import celery; from datetime import timedelta; ..." + codebase inspection]

Celery beat schedule syntax (Celery 5.6.3):

from datetime import timedelta

celery_app.conf.beat_schedule = {
    "cleanup-abandoned-uploads": {
        "task": "tasks.document_tasks.cleanup_abandoned_uploads",
        "schedule": timedelta(minutes=30),
    }
}

Add to celery_app.py. The existing celery-worker service in docker-compose.yml only runs celery -A celery_app worker. Beat requires a separate process: celery -A celery_app beat --loglevel=info. Add a celery-beat service to docker-compose.yml.

Task implementation:

@celery_app.task(name="tasks.document_tasks.cleanup_abandoned_uploads")
def cleanup_abandoned_uploads() -> dict:
    return asyncio.run(_cleanup_abandoned())

async def _cleanup_abandoned() -> dict:
    from datetime import datetime, timezone, timedelta
    from sqlalchemy import select, delete
    from db.session import AsyncSessionLocal
    from db.models import Document
    from storage import get_storage_backend

    cutoff = datetime.now(timezone.utc) - timedelta(hours=1)
    async with AsyncSessionLocal() as session:
        result = await session.execute(
            select(Document).where(
                Document.status == "pending",
                Document.created_at < cutoff,
            )
        )
        docs = result.scalars().all()
        backend = get_storage_backend()
        cleaned = 0
        for doc in docs:
            try:
                if doc.object_key:
                    await backend.delete_object(doc.object_key)
            except Exception:
                pass  # MinIO object may not exist yet — safe to ignore
            await session.delete(doc)
            cleaned += 1
        await session.commit()
    return {"cleaned": cleaned}

Idempotency: Re-running the task while some cleanup is in progress is safe — the WHERE status='pending' AND created_at < cutoff filter is idempotent. Partially cleaned rows (MinIO gone, DB row not yet deleted) are handled by ignoring MinIO deletion errors.

Beat routing: Add beat task to the documents queue or a separate beat queue. The existing task_routes in celery_app.py covers tasks.document_tasks.* → documents queue, so the cleanup task will route there automatically.

Finding 11: Settings Flat-File Retirement Scope

[VERIFIED: codebase inspection of all affected files]

Files to modify:

backend/api/settings.py — delete entire file (router removed from main.py)
backend/services/storage.py — remove load_settings(), save_settings(), mask_api_key(), settings_masked(), DEFAULT_SETTINGS import, SETTINGS_FILE import
backend/config.py — remove SETTINGS_FILE, DEFAULT_SETTINGS, DEFAULT_SYSTEM_PROMPT; add SYSTEM_PROMPT, DEFAULT_AI_PROVIDER, DEFAULT_AI_MODEL
backend/services/classifier.py — remove storage.load_settings() and get_provider(settings) calls; accept ai_provider/ai_model parameters
backend/main.py — remove api/settings.py router include
frontend/src/stores/settings.js — remove (or gut to minimal placeholder)
frontend/src/views/SettingsView.vue — replace with placeholder or remove route
frontend/src/api/client.js — remove getSettings(), patchSettings(), testProvider(), getDefaultPrompt()
frontend/src/router/index.js — remove /settings route (or keep as redirect to account)
frontend/src/components/layout/AppSidebar.vue — remove Settings nav link

SettingsView.vue fate: The existing view lets users configure AI provider. Phase 3 removes user control of AI config (admin-only via admin panel). The /settings route should display a message: "AI configuration is managed by your admin." The SettingsView replacement should be minimal — a single message card, no form.

Finding 12: `ai/init.py` — `get_provider()` Factory

[VERIFIED: codebase inspection of backend/ai/__init__.py]

The classifier refactor requires understanding the get_provider() factory. Let me check its current signature:

The factory currently takes a settings dict (legacy format). For Phase 3, the classifier will have ai_provider (string like "ollama", "anthropic") and ai_model (string). The factory call should become:

# Construct a minimal settings-like dict from parameters
_settings = {
    "active_provider": ai_provider,
    "providers": {
        ai_provider: {"model": ai_model}
        # For API-key providers, keys come from env vars
    }
}
provider = get_provider(_settings)

Or refactor get_provider() to accept (provider_name: str, model: str) directly — cleaner but requires updating the factory signature. The planner should choose: minimal change (dict construction) vs proper refactor (new signature). Minimal change is safer for Phase 3 scope.

Implementation Risks

Risk 1: MinIO CORS — Browser PUT May Silently Fail

Severity: HIGH — blocks the entire presigned upload flow from working in the browser

What goes wrong: If MINIO_API_CORS_ALLOW_ORIGIN is not set in the MinIO container environment, the browser's OPTIONS preflight for the PUT request will receive no Access-Control-Allow-Origin header and abort. The JavaScript error is ERR_FAILED or a CORS error with no useful detail.

How to prevent:

Add MINIO_API_CORS_ALLOW_ORIGIN: "${CORS_ORIGINS}" (or http://localhost:5173) to the MinIO service in docker-compose.yml.
Add MINIO_PUBLIC_ENDPOINT env var and dual MinIO client (Finding 3) so presigned URLs use localhost:9000 not minio:9000.
Integration test must exercise the full browser path (not just the FastAPI endpoints).

Early detection: Create a minimal curl-based test: curl -X OPTIONS http://localhost:9000/docuvault/test-key -H "Origin: http://localhost:5173" -H "Access-Control-Request-Method: PUT" — should return Access-Control-Allow-Origin.

Risk 2: Concurrent Upload Race Condition (SC2)

Severity: HIGH — Phase 3 SC2 requires exactly one success and one 413 when two uploads exceed quota

What goes wrong: Two concurrent /confirm calls both see used_bytes=90MB, both calculate 90+15=105 > 100 as still under limit, both succeed — quota goes to 120MB.

Why this cannot happen with the atomic pattern: The UPDATE quotas SET used_bytes = used_bytes + :delta WHERE (used_bytes + :delta) <= limit_bytes is atomic at the PostgreSQL row level. PostgreSQL's row-level locking ensures the second UPDATE sees the result of the first. The WHERE clause re-evaluates against the committed state.

Verification: The SC2 test must fire two concurrent requests using asyncio.gather() and assert exactly one 200 and one 413.

Risk 3: Alembic Migration with MinIO Connection

Severity: MEDIUM — migration may fail if MinIO is unreachable at migration time

What goes wrong: If alembic upgrade head is run while MinIO is not yet healthy (e.g., early in docker compose up), the migration's MinIO object deletion step will fail, aborting the migration.

How to prevent: The migration reads MinIO connection details from environment variables (MINIO_ENDPOINT, etc.). These are available in the backend container. docker-compose.yml already has minio: condition: service_healthy in backend depends_on — migrations should be run after services are healthy. Document that migrations must run after docker compose up completes health checks.

Rollback safety: If the migration fails mid-way through MinIO deletions but before the ALTER COLUMN, the DB is in a consistent state (nullable user_id still allowed). Re-running alembic upgrade head is idempotent at the schema level (Alembic tracks applied revisions), but the MinIO cleanup step will re-attempt on partial data. This is safe — remove_object on an already-deleted key returns silently.

Risk 4: `stat_object()` Size vs. Actual Quota Needs

Severity: LOW — edge case with content-type negotiation

What goes wrong: The file size from stat_object().size is the raw byte count of the uploaded object. This matches what was PUT. No edge cases expected — the presigned URL is for a single-part PUT, not multipart.

Resolved: Simple, no special handling needed.

Risk 5: Topics Deduplication Cross-Namespace

Severity: MEDIUM — subtle bug if not addressed

What goes wrong (without fix): If create_topic() is called with name "Finance" and user_id=None (system topic), and later called with name "Finance" and user_id=some_uuid, the current deduplication query WHERE lower(name) = 'finance' would return the system topic and skip creating a per-user topic. The user's document would then be tagged to the system topic (user_id=NULL), not their personal topic.

How to prevent: Scope the deduplication query by user_id as described in Finding 6.

Severity: LOW — UX regression if not handled

What goes wrong: Users who navigate to /settings (or have it bookmarked) will see a broken page or 404 after SettingsView.vue is removed.

How to prevent: Keep the /settings route but replace SettingsView.vue content with a simple "AI configuration is managed by your admin" message. Remove the form, API calls, and store interactions.

Validation Architecture

Test Framework

Property	Value
Framework	pytest + pytest-asyncio (existing in codebase)
Config	`backend/pytest.ini` or `backend/pyproject.toml` (check before Wave 0)
Quick run command	`cd backend && pytest tests/test_documents.py tests/test_topics.py -x -q`
Full suite command	`cd backend && pytest -v`

Phase Requirements → Test Map

Req ID	Behavior	Test Type	Automated Command	File Exists?
STORE-03	Atomic quota enforced at confirm; no double-spend	unit + integration	`pytest tests/test_quota.py -x`	No — Wave 0
STORE-03 (SC2)	Two concurrent uploads at limit → exactly one 413	integration	`pytest tests/test_quota.py::test_concurrent_quota_race -x`	No — Wave 0
STORE-04	`GET /api/me/quota` returns `{used_bytes, limit_bytes}`	unit	`pytest tests/test_documents.py::test_get_quota -x`	No — Wave 0
STORE-05	Confirm endpoint returns 413 with `{used_bytes, limit_bytes, rejected_bytes}`	unit	`pytest tests/test_quota.py::test_quota_exceeded_response -x`	No — Wave 0
STORE-06	Delete decrements quota atomically	unit	`pytest tests/test_quota.py::test_delete_decrements_quota -x`	No — Wave 0
SEC-04	Cross-user document access returns 404	unit	`pytest tests/test_documents.py::test_cross_user_access_404 -x`	No — Wave 0
SEC-04	Admin JWT on document endpoint returns 403	unit	`pytest tests/test_documents.py::test_admin_cannot_access_documents -x`	No — Wave 0
DOC-03	AI classification uses user's assigned provider not global config	unit	`pytest tests/test_classifier.py::test_per_user_provider -x`	No — Wave 0
DOC-04	Topic list returns system topics + user topics (not other users')	unit	`pytest tests/test_topics.py::test_topic_namespace -x`	No — Wave 0
DOC-05	Celery task resolves provider from document owner's DB record	unit	`pytest tests/test_classifier.py::test_celery_task_uses_user_provider -x`	No — Wave 0
D-02	Alembic migration 0003 applies cleanly, null-user docs deleted	integration	`pytest tests/test_alembic.py::test_migration_0003 -x`	No — Wave 0
D-05	Upload-url endpoint creates pending Document row + returns presigned URL	unit	`pytest tests/test_documents.py::test_upload_url_endpoint -x`	No — Wave 0
D-05	Confirm endpoint calls stat_object, updates size, sets status=uploaded	unit	`pytest tests/test_documents.py::test_confirm_endpoint -x`	No — Wave 0
D-12	`/api/settings` returns 404 (removed)	unit	`pytest tests/test_settings.py::test_settings_endpoint_removed -x`	No — Wave 0

Sampling Rate

Per task commit: cd backend && pytest tests/test_documents.py tests/test_quota.py tests/test_topics.py -x -q
Per wave merge: cd backend && pytest -v
Phase gate: Full suite green before /gsd:verify-work

Wave 0 Gaps

tests/test_quota.py — covers STORE-03, STORE-05, STORE-06, SC2 race condition
Test fixture auth_user — authenticated user with quota row (needed across quota and document tests)
Test fixture admin_user — admin-role user (needed for admin 403 tests)
Mock for MinIO presigned_put_object and stat_object in unit tests (document tests must not require live MinIO)

The existing conftest.py already provides db_session (in-memory SQLite) and async_client. The new fixtures extend this.

Recommended Approach

Wave Structure

Wave 1: Migration + Quota Foundation (no external deps required)

Alembic migration 0003_multi_user_isolation.py:
- Delete null-user Document rows (collect object_keys first, then DB delete, then MinIO delete)
- Delete all Topic rows
- Add NOT NULL to documents.user_id
- Reconcile quotas
- Add index ix_topics_user_id
MinIOBackend: add presigned_put_url() method (dual client for public endpoint)
StorageBackend ABC: add presigned_put_url() abstract method
config.py: add MINIO_PUBLIC_ENDPOINT, SYSTEM_PROMPT, DEFAULT_AI_PROVIDER, DEFAULT_AI_MODEL
docker-compose.yml: add MINIO_API_CORS_ALLOW_ORIGIN, MINIO_PUBLIC_ENDPOINT to MinIO service; add celery-beat service

Wave 2: Backend API (blocked on Wave 1)

api/documents.py: replace /upload with /upload-url + /{id}/confirm; add get_current_user (or get_regular_user) to all handlers; add ownership assertions (D-16)
api/topics.py: add get_current_user; scope queries by user_id IN (current_user.id, NULL); add POST /api/admin/topics endpoint
api/auth.py: add GET /api/me/quota endpoint
api/settings.py: delete file; remove from main.py
services/storage.py: remove settings functions; update create_topic() to accept user_id; update delete_document() to decrement quota
services/classifier.py: accept ai_provider/ai_model parameters; remove load_settings() call

Wave 3: Celery (blocked on Wave 2)

tasks/document_tasks.py: update _run() to look up user.ai_provider/user.ai_model; pass to classifier
tasks/document_tasks.py: add cleanup_abandoned_uploads task
celery_app.py: add beat_schedule configuration

Wave 4: Frontend (blocked on Wave 2)

api/client.js: add getUploadUrl(), confirmUpload(), getMyQuota(); remove settings functions
stores/documents.js: update upload() to three-step flow with XHR progress
components/upload/UploadProgress.vue: add progress percentage display
components/layout/AppSidebar.vue: add quota bar
views/SettingsView.vue: replace with "managed by admin" placeholder
stores/settings.js: remove or gut

Architecture Diagram

Browser
  │
  ├─ POST /api/documents/upload-url  ──► FastAPI ──► DB: INSERT Document(status=pending)
  │                                               └──► MinIO: presigned_put_object()
  │                                               └──► returns {upload_url, document_id}
  │
  ├─ PUT {upload_url} ──────────────────────────────────────────────────────► MinIO (direct)
  │  (XHR with progress events)
  │
  └─ POST /api/documents/{id}/confirm ─► FastAPI ──► MinIO: stat_object() → size
                                                 └──► DB: atomic quota UPDATE
                                                 └──► DB: Document.status = 'uploaded'
                                                 └──► Celery: extract_and_classify.delay()
                                                 └──► returns {document_id, used_bytes}

Celery Worker
  └─ extract_and_classify(document_id)
       ├─ DB: SELECT user.ai_provider, user.ai_model WHERE user.id = doc.user_id
       ├─ MinIO: get_object(object_key)
       ├─ extractor.extract_text()
       ├─ classifier.classify_document(ai_provider, ai_model)
       │    └─ DB: SELECT topics WHERE user_id IS NULL OR user_id = doc.user_id
       └─ DB: UPDATE document SET status='classified', extracted_text=...

Celery Beat (every 30 min)
  └─ cleanup_abandoned_uploads()
       ├─ DB: SELECT documents WHERE status='pending' AND created_at < now()-1h
       ├─ MinIO: remove_object() for each
       └─ DB: DELETE documents

Project Constraints (from CLAUDE.md)

Directive	Impact on Phase 3
JWT access token in Pinia memory only	`uploadDocument()` flow in Vue must use `useAuthStore().accessToken` for the `/upload-url` and `/confirm` API calls (via existing `request()` helper); XHR for MinIO PUT does NOT send auth header (MinIO presigned URL is self-authenticating)
MinIO object keys UUID-based `{user_id}/{document_id}/{uuid4()}{ext}`	Key generated in `upload-url` handler, stored in `Document.object_key`, used unchanged at confirm and cleanup
Atomic quota UPDATE pattern	Exact SQL from CLAUDE.md: `UPDATE quotas SET used_bytes = used_bytes + $delta WHERE (used_bytes + $delta) <= limit_bytes RETURNING used_bytes`
Admin endpoints never return document content or `credentials_enc`	Admin topics endpoint at `POST /api/admin/topics` returns topic metadata only; admin cannot call document content endpoints (SEC-04 + D-16)
Every document/folder endpoint asserts `resource.user_id == current_user.id`	Ownership assertion in every handler — return 404 for cross-user access
All DB queries via ORM / parameterized statements	`text()` with bound parameters is used for the atomic quota SQL — always pass `:delta` and `:uid` as parameters, never interpolate
SEC-04	Object keys are stored in DB at upload-url time; confirm endpoint fetches key from DB (`Document.object_key`) — never reconstructs from request params

Environment Availability

Dependency	Required By	Available	Version	Fallback
MinIO SDK (`minio`)	Presigned PUT, stat_object	Yes	7.2.20	—
Celery + Redis	Beat schedule, cleanup task	Yes	5.6.3	—
SQLAlchemy `text()`	Atomic quota SQL	Yes	2.0+	—
psycopg v3	Async DB driver	Yes (3.2.13)	3.2.13	—
`aiosqlite`	Test in-memory DB	Yes (0.20.0)	0.20.0	—
`asyncio.to_thread()`	Sync MinIO SDK wrapping	Yes (Python 3.10+)	Python 3.12	—

No blocking missing dependencies.

Assumptions Log

#	Claim	Section	Risk if Wrong
A1	`MINIO_API_CORS_ALLOW_ORIGIN` env var works reliably on current MinIO docker image for browser PUT CORS	Finding 3	Browser PUT blocked by CORS; fallback is `mc admin config set` in `docker-compose.yml` entrypoint or nginx CORS proxy
A2	PostgreSQL row-level locking on the quota `UPDATE WHERE` is sufficient to prevent concurrent over-quota uploads	Finding 4 / Risk 2	Two simultaneous confirms could both succeed — would require `SELECT FOR UPDATE` on quota row or advisory lock
A3	`MinIOBackend.stat_object()` sees the completed PUT immediately with no eventual-consistency delay	Finding 5	`NoSuchKey` error on stat_object even though PUT succeeded — add retry with exponential backoff as fallback
A4	`ai/__init__.py` `get_provider()` factory accepts a dict with `active_provider` and `providers` keys	Finding 12	Factory may have a different internal signature — inspect before refactoring classifier

Note on A2: PostgreSQL's UPDATE ... WHERE with a conditional is a standard serializable approach for quota enforcement. PostgreSQL's statement-level locking ensures the second concurrent UPDATE sees the committed result of the first. This is the same pattern used by many SaaS systems for rate limiting and quota enforcement. Risk is LOW given the choice of PostgreSQL (which has strong MVCC guarantees), but the SC2 integration test must verify this explicitly.

Sources

Primary (HIGH confidence)

MinIO Python SDK v7.2.20 — verified via help(Minio.presigned_put_object), help(Minio.stat_object), dir(Minio) — local installation
SQLAlchemy text() and CursorResult — verified via Python inspection of installed package
Celery 5.6.3 beat_schedule syntax — verified via python3 -c "import celery; ..."
psycopg v3.2.13 — verified rowcount + fetchone() behavior
Codebase inspection: backend/storage/minio_backend.py, backend/db/models.py, backend/services/storage.py, backend/tasks/document_tasks.py, backend/services/classifier.py, backend/deps/auth.py, backend/api/documents.py, backend/api/topics.py, frontend/src/api/client.js, frontend/src/stores/documents.js
CONTEXT.md — all locked decisions (D-01 through D-17)
CLAUDE.md — architectural constraints

Secondary (MEDIUM confidence)

MinIO GitHub discussion #20555 — per-bucket CORS is enterprise-only in MinIO open source
MinIO GitHub issue #16479 — mc admin config set as authoritative CORS configuration method
MinIO CORS MINIO_API_CORS_ALLOW_ORIGIN env var — referenced in multiple GitHub issues and community reports; enterprise docs confirm it exists at the global level
Medium article on Docker hostname resolution — dual-client / nginx proxy approaches for presigned URL Docker hostname issue

Tertiary (LOW confidence)

XMLHttpRequest upload progress events vs. fetch — referenced in multiple blog posts; treated as LOW confidence due to unverified sources, but cross-confirmed by MDN documentation (fetch does not expose upload progress)

Metadata

Confidence breakdown:

Alembic migration pattern: HIGH — established in migrations 0001 and 0002 in this codebase
MinIO SDK API (presigned_put_object, stat_object): HIGH — verified against installed SDK
Atomic quota SQL: HIGH — verified with SQLAlchemy text() and psycopg v3
MinIO CORS open-source solution: MEDIUM — MINIO_API_CORS_ALLOW_ORIGIN is documented but has had reliability issues; mc admin config set is more reliable but harder to automate in compose
Docker presigned URL hostname issue: HIGH — well-documented, dual-client solution is clean
Auth guard patterns: HIGH — get_current_user/get_current_admin already in production

Research date: 2026-05-23 Valid until: 2026-06-22 (30 days — stable libraries)

46 KiB Raw Blame History