curo/kite

Files

T

curo1305 6fed5ba531 docs(01): create phase 1 plan — 5 plans in 4 waves

Research, pattern mapping, and verification complete.
Walking Skeleton mode active (MVP Phase 1).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-22 08:49:36 +02:00

61 KiB

Raw Blame History

Phase 1: Infrastructure Foundation - Research

Researched: 2026-05-21 Domain: PostgreSQL + MinIO + Redis + Celery wired into FastAPI via Docker Compose; Alembic async migrations; storage service rewrite Confidence: HIGH

<user_constraints>

User Constraints (from CONTEXT.md)

Locked Decisions

Schema Scope

D-01: Phase 1 initial Alembic migration creates the full v1 skeleton — all tables: users, refresh_tokens, quotas, documents, topics, folders, shares, audit_log, cloud_connections. Subsequent phases add data and constraints, not new tables.
D-02: groups table stub included in Phase 1 migration (v2 feature; empty table, correct columns and FKs).
D-03: documents.user_id is nullable in Phase 1 (no auth system yet). Phase 2 migration adds the NOT NULL constraint after the user/auth system is live.
D-04: Existing data/ directory contents (flat-file JSON metadata + uploaded files) are deleted in Phase 1. Test data only — no migration script needed.

App Wiring

D-05: Phase 1 switches the storage service layer to PostgreSQL + MinIO. backend/services/storage.py is rewritten to use async SQLAlchemy + MinIO SDK. The app does not continue using the filesystem after Phase 1.
D-06: Single MinIO bucket named docuvault. Object keys follow {user_id}/{document_id}/{uuid4()}{ext} (STORE-02). Human-readable filenames stored in the documents.filename DB column only — never in the MinIO key.
D-07: backend/main.py /health endpoint extended to check PostgreSQL + MinIO connectivity (not just {"status": "ok"}). Health checks gate docker compose up readiness.

Background Worker

D-08: Background task queue: Celery + Redis (STORE-08). FastAPI BackgroundTasks replaced.
D-09: Redis service added to docker-compose.yml in Phase 1. Redis doubles as the rate-limiting store for Phase 2 auth endpoints — no second Redis needed later.
D-10: A celery-worker service is added to docker-compose.yml. Celery broker and result backend both point to the same Redis instance via REDIS_URL.

Env / Secrets Strategy

D-11: .env gitignored + .env.example committed. docker-compose.yml reads vars via ${VAR_NAME}. .env.example has safe placeholder values and comments explaining each variable.
D-12: Production secrets stored outside the project directory at /etc/docuvault/env (chmod 600, owned by the service user, not root). docker-compose.yml references it via env_file:. Documented in deployment notes.
D-13: Two PostgreSQL DSNs: DATABASE_URL (restricted app user docuvault_app, SELECT/INSERT/UPDATE/DELETE only; no DDL) and DATABASE_MIGRATE_URL (migration user docuvault_migrate, DDL privileges; used only by Alembic).
D-14: PostgreSQL init script in docker/postgres/initdb.d/ provisions both users on first container start. The app never connects as the PostgreSQL superuser.
D-15: MinIO vars: MINIO_ENDPOINT, MINIO_ROOT_USER, MINIO_ROOT_PASSWORD (init only), MINIO_BUCKET (value: docuvault), MINIO_ACCESS_KEY, MINIO_SECRET_KEY (separate app-level access key pair with minimal bucket permissions).
D-16: Additional vars in Phase 1 .env.example: REDIS_URL, SECRET_KEY (documented now for Phase 2 JWT + HKDF use; app does not read it in Phase 1).

Claude's Discretion

None — user made explicit choices for all areas.

Deferred Ideas (OUT OF SCOPE)

None — discussion stayed within phase scope. </user_constraints>

<phase_requirements>

Phase Requirements

ID	Description	Research Support
STORE-01	Platform storage layer migrated from flat-file JSON + local filesystem to PostgreSQL (metadata) + MinIO (objects)	SQLAlchemy 2.0 async ORM + MinIO SDK patterns documented; service rewrite approach confirmed
STORE-02	Each user's MinIO objects use `{user_id}/{document_id}/{uuid4()}{ext}` keys — human-readable filenames stored in DB only	MinIO `put_object()` API confirmed; key schema enforced in model/service layer
STORE-07	Backend is stateless — no per-instance file locks; multiple instances can run behind a load balancer	PostgreSQL atomic UPDATE + Celery + Redis replaces filelock pattern; verified
</phase_requirements>

Summary

Phase 1 replaces the entire flat-file persistence layer (JSON metadata + local filesystem uploads) with PostgreSQL (via SQLAlchemy 2.0 async ORM) + MinIO (via the official Python SDK) wired into Docker Compose. Redis and a Celery worker are added alongside as the background task queue that replaces FastAPI BackgroundTasks, delivering statelessness required by STORE-07. All infrastructure services are health-checked and ordered via depends_on conditions so docker compose up can be treated as the single operational command. Alembic manages the schema using the async migration template with a two-DSN strategy (restricted app user + DDL migration user). The walking skeleton requirement is satisfied by: the full v1 schema applied via Alembic, one real document upload persisted to PostgreSQL and MinIO through the rewritten storage service, and the /health endpoint returning live connectivity checks for all three services.

The existing single-user document upload → text extraction → AI classification workflow continues to work end-to-end after Phase 1. The Vue frontend requires no changes. All API routes and response shapes are preserved.

Primary recommendation: Wire infrastructure with Docker Compose health checks first; apply Alembic migration second; rewrite services/storage.py third; replace BackgroundTasks with Celery tasks last. This ordering allows each layer to be verified before the next is built.

Architectural Responsibility Map

Capability	Primary Tier	Secondary Tier	Rationale
Document metadata persistence	Database / Storage (PostgreSQL)	API / Backend	All metadata is authored and read server-side; no client involvement
Binary file storage	Database / Storage (MinIO)	API / Backend	Object store owns bytes; backend generates keys and proxies operations
Background text extraction + classification	Background Worker (Celery)	API / Backend	CPU-intensive, deferred; must not block HTTP event loop
Health checking	API / Backend	Docker Compose	FastAPI `/health` probes PostgreSQL + MinIO; Compose waits on it
Schema migrations	Database / Storage (Alembic + PostgreSQL)	—	DDL-only responsibility; executed before app starts
Object key namespacing	API / Backend (service layer)	—	Key construction is a code concern, not a storage concern
Service ordering / startup sequencing	CDN / Static (Docker Compose)	—	`depends_on: condition: service_healthy` enforces boot order
Connection pooling	API / Backend (SQLAlchemy pool)	Database / Storage	App holds pool; PostgreSQL is the pooled resource
Task queue / broker	Background Worker (Redis / Celery)	API / Backend	Broker is Redis; workers are separate Docker Compose services

Standard Stack

Core

Library	Version	Purpose	Why Standard
`sqlalchemy[asyncio]`	`>=2.0.49`	ORM + async engine + connection pool	Industry standard for Python async PostgreSQL; `create_async_engine` + `async_sessionmaker` pattern is the canonical FastAPI integration
`psycopg[binary]`	`>=3.3.4`	PostgreSQL async driver	psycopg v3 (`psycopg`) is SQLAlchemy 2.0's preferred async dialect; `[binary]` provides pre-built wheels with no system dependency on libpq headers
`alembic`	`>=1.18.4`	Database migrations	The only maintained migration tool for SQLAlchemy; provides async template (`alembic init -t async`)
`minio`	`>=7.2.20`	MinIO / S3 object storage SDK	Official MinIO Python SDK; stable API for `put_object`, `get_object`, `bucket_exists`, `presigned_get_object`
`celery[redis]`	`>=5.6.3`	Background task queue + Redis transport	Battle-tested distributed task queue; `[redis]` extra installs `redis` client; replaces per-instance `BackgroundTasks`
`redis`	`>=7.4.0`	Redis Python client (Celery dependency + Phase 2 rate limiting)	Official Redis client; installed transitively by `celery[redis]` but worth pinning for Phase 2 rate limiting use

Supporting

Library	Version	Purpose	When to Use
`pydantic-settings`	`>=2.2`	Env var configuration (already in project)	Extended with new DATABASE_URL, MINIO_*, REDIS_URL vars
`anyio`	`>=4.13.0`	Async testing utilities	Required by `httpx` for async test transport in pytest
`httpx`	`>=0.28.1`	Async HTTP client for integration tests	Needed to replace `TestClient` (sync) with `AsyncClient` for async route testing
`pytest-asyncio`	`>=1.3.0`	Async test runner integration	Already in project as `>=0.23`; upgrade to `>=1.3.0` for `asyncio_mode = auto` support in new async tests

Alternatives Considered

Instead of	Could Use	Tradeoff
`psycopg[binary]`	`asyncpg`	`asyncpg` is faster in benchmarks but requires a separate sync driver (`psycopg2`) for Alembic. `psycopg` v3 works for both sync (Alembic) and async (FastAPI) with the same URL — zero driver switching
`celery[redis]`	`pgqueuer` / `pg_boss`	pgqueuer uses PostgreSQL as the queue (no Redis required). However, the user explicitly selected Celery + Redis. Redis is also needed in Phase 2 for rate limiting, so Redis is justified regardless
`minio` Python SDK (sync, wrapped in `asyncio.to_thread`)	`aiobotocore`	MinIO SDK is the official client with full API coverage including MinIO-specific features. `aiobotocore` is AWS-oriented and less tested with MinIO-specific APIs. `to_thread()` wrapping is the correct async pattern for the sync SDK

Installation (backend/requirements.txt additions):

sqlalchemy[asyncio]>=2.0
psycopg[binary]>=3.3
alembic>=1.13
minio>=7.2
celery[redis]>=5.4
redis>=7.0
httpx>=0.27
pytest-asyncio>=0.23

Note: psycopg[binary] is specified with bracket extras in requirements.txt. The binary extra installs a self-contained wheel — no system libpq-dev package required in the Docker image, simplifying the Dockerfile.

Package Legitimacy Audit

All packages verified on PyPI registry via pip3 index versions and slopcheck install (v0.6.1, run 2026-05-21).

Package	Registry	Age	Downloads	Source Repo	slopcheck	Disposition
`sqlalchemy`	PyPI	~20 yrs	Very high (millions/wk)	github.com/sqlalchemy/sqlalchemy	OK	Approved
`psycopg`	PyPI	~4 yrs (v3)	High	github.com/psycopg/psycopg	OK	Approved
`alembic`	PyPI	~12 yrs	Very high	github.com/sqlalchemy/alembic	OK	Approved
`minio`	PyPI	~8 yrs	High	github.com/minio/minio-py	OK	Approved
`celery`	PyPI	~15 yrs	Very high (millions/wk)	github.com/celery/celery	OK	Approved
`redis`	PyPI	~12 yrs	Very high	github.com/redis/redis-py	OK	Approved

Packages removed due to slopcheck [SLOP] verdict: none Packages flagged as suspicious [SUS]: none

Note: psycopg[binary] is specified with extras syntax in requirements.txt; the installable wheel is psycopg-binary on PyPI, which also passed registry verification (version 3.3.4 confirmed). [VERIFIED: PyPI registry + slopcheck OK]

Architecture Patterns

System Architecture Diagram

Browser (Vue 3 SPA — unchanged in Phase 1)
    │  HTTP/JSON + multipart (same API contract)
    ▼
FastAPI (port 8000) — lifespan creates async engine, disposes on shutdown
    │
    ├── api/documents.py        ─── calls ──►  services/storage.py (REWRITTEN)
    │                                               │
    │                                               ├─► db/session.py (AsyncSession)
    │                                               │       │
    │                                               │       ▼
    │                                               │   PostgreSQL (port 5432)
    │                                               │   [docuvault_app user, restricted]
    │                                               │
    │                                               └─► storage/minio_backend.py
    │                                                       │
    │                                                       ▼
    │                                                   MinIO (port 9000)
    │                                                   [bucket: docuvault]
    │                                                   [app-level access key]
    │
    ├── /health                 ─── probes ──► PostgreSQL + MinIO connectivity
    │
    └── celery_app.py  ─── enqueues tasks ──► Redis (port 6379)
                                                  │
                                          Celery Worker (separate container)
                                              ├── task: extract_and_classify()
                                              │     ├─► services/extractor.py
                                              │     └─► services/classifier.py
                                              └── consumes from Redis queue

Alembic (run once at deploy time, not part of app startup)
    │  uses DATABASE_MIGRATE_URL (docuvault_migrate user, DDL privileges)
    └─► PostgreSQL — applies full v1 schema

Recommended Project Structure

backend/
├── main.py                     # FastAPI app; extend lifespan for engine/dispose
├── config.py                   # pydantic-settings: extend with new env vars
├── celery_app.py               # Celery app instance (broker from REDIS_URL)
├── db/
│   ├── __init__.py
│   ├── session.py              # async engine + async_sessionmaker
│   └── models.py               # all SQLAlchemy ORM models (full v1 schema)
├── deps/
│   └── db.py                   # get_db() — yields AsyncSession
├── services/
│   ├── storage.py              # REPLACED: async SQLAlchemy + MinIO SDK
│   ├── extractor.py            # unchanged
│   └── classifier.py          # update to accept session; dispatch via Celery
├── storage/                    # NEW: StorageBackend ABC + MinIO implementation
│   ├── __init__.py             # get_storage_backend() factory
│   ├── base.py                 # StorageBackend ABC (mirrors ai/base.py)
│   └── minio_backend.py        # MinIO implementation
├── tasks/
│   └── document_tasks.py       # Celery task definitions (extract_and_classify)
├── migrations/                 # Alembic migration directory
│   ├── env.py                  # async env.py with two-DSN strategy
│   ├── script.py.mako
│   └── versions/
│       └── 0001_initial_schema.py
├── alembic.ini                 # sqlalchemy.url = DATABASE_MIGRATE_URL
├── api/
│   ├── documents.py            # update to use async storage service
│   ├── topics.py               # unchanged (topics still in DB after migration)
│   └── settings.py             # unchanged
└── tests/
    ├── conftest.py             # UPDATE: add async engine + session fixtures
    ├── test_health.py          # UPDATE: test PostgreSQL + MinIO health probes
    ├── test_documents.py       # UPDATE: adapt for async storage layer
    └── test_storage.py         # NEW: unit tests for MinIO object key schema

Pattern 1: SQLAlchemy 2.0 Async Engine + Session Factory (FastAPI Lifespan)

What: Create engine once at startup, share it application-wide via app.state. Session factory (async_sessionmaker) yields per-request sessions via a FastAPI dependency.

When to use: Any database access in FastAPI route handlers or services.

Example:

# db/session.py
from sqlalchemy.ext.asyncio import create_async_engine, async_sessionmaker, AsyncSession
from config import settings

engine = create_async_engine(
    settings.database_url,          # postgresql+psycopg://docuvault_app:...@postgres/docuvault
    pool_pre_ping=True,             # detect stale connections before use
    echo=False,
)

AsyncSessionLocal = async_sessionmaker(
    engine,
    class_=AsyncSession,
    expire_on_commit=False,         # prevent lazy-load errors after commit
)

# deps/db.py
from db.session import AsyncSessionLocal

async def get_db():
    async with AsyncSessionLocal() as session:
        try:
            yield session
        finally:
            await session.close()

# main.py — lifespan
from contextlib import asynccontextmanager
from db.session import engine

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Startup: engine creates pool on first connection
    yield
    # Shutdown: close all pooled connections
    await engine.dispose()

app = FastAPI(lifespan=lifespan)

Source: [CITED: docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html]

Key detail — URL format for psycopg v3:

postgresql+psycopg://user:password@host:port/dbname

The same postgresql+psycopg:// prefix works for both create_engine() (Alembic) and create_async_engine() (FastAPI). SQLAlchemy selects the sync or async dialect variant automatically. [CITED: docs.sqlalchemy.org/en/20/dialects/postgresql.html]

Key detail — expire_on_commit=False: After session.commit(), SQLAlchemy marks all objects as expired and would trigger another SELECT on next attribute access. In async context, this causes MissingGreenlet errors because there's no active async context at that point. Setting expire_on_commit=False prevents this. [CITED: docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html]

Pattern 2: Alembic Async Configuration with Two DSNs

What: Alembic's async template (alembic init -t async) generates env.py that uses async_engine_from_config and asyncio.run(). The DATABASE_MIGRATE_URL DSN (DDL privileges) is used only by Alembic; the app uses DATABASE_URL (restricted). This separates migration risk from runtime risk.

When to use: Every alembic upgrade head call. Never used by FastAPI directly.

Example:

# migrations/env.py (key section — async online migrations)
import asyncio
from sqlalchemy.ext.asyncio import async_engine_from_config
from sqlalchemy import pool
from alembic import context
from db.models import Base  # import all models so metadata is populated

target_metadata = Base.metadata

def do_run_migrations(connection):
    context.configure(connection=connection, target_metadata=target_metadata)
    with context.begin_transaction():
        context.run_migrations()

async def run_async_migrations():
    connectable = async_engine_from_config(
        config.get_section(config.config_ini_section, {}),
        prefix="sqlalchemy.",
        poolclass=pool.NullPool,   # migrations use per-run connection, not pool
    )
    async with connectable.connect() as connection:
        await connection.run_sync(do_run_migrations)
    await connectable.dispose()

def run_migrations_online():
    asyncio.run(run_async_migrations())

# alembic.ini
[alembic]
script_location = migrations
sqlalchemy.url = %(DATABASE_MIGRATE_URL)s   # reads from env via %(VAR)s interpolation

Two-DSN in practice: The alembic.ini sqlalchemy.url references DATABASE_MIGRATE_URL. FastAPI's db/session.py reads DATABASE_URL. Both are set in .env. The Docker Compose backend service has both env vars; the celery-worker service has DATABASE_URL only (workers need no DDL).

Source: [CITED: alembic.sqlalchemy.org/en/latest/cookbook.html#using-asyncio-with-alembic] + [CITED: github.com/sqlalchemy/alembic/blob/main/alembic/templates/async/env.py]

Pattern 3: MinIO SDK Sync-in-Async via `asyncio.to_thread()`

What: The MinIO Python SDK is synchronous. In an async FastAPI context, blocking I/O blocks the event loop. Wrap MinIO SDK calls in asyncio.to_thread() to offload to a thread pool without blocking.

When to use: All MinIO operations (put_object, get_object, bucket_exists, presigned_get_object) called from async def handlers or services.

Example:

# storage/minio_backend.py
import asyncio
import io
import uuid
from datetime import timedelta
from minio import Minio
from storage.base import StorageBackend

class MinIOBackend(StorageBackend):
    def __init__(self, endpoint: str, access_key: str, secret_key: str,
                 bucket: str, secure: bool = False):
        self._client = Minio(
            endpoint=endpoint,
            access_key=access_key,
            secret_key=secret_key,
            secure=secure,         # False for Docker internal network (HTTP)
        )
        self._bucket = bucket

    async def put_object(
        self,
        user_id: str,
        document_id: str,
        file_bytes: bytes,
        extension: str,
        content_type: str,
    ) -> str:
        object_key = f"{user_id}/{document_id}/{uuid.uuid4()}{extension}"
        data = io.BytesIO(file_bytes)
        await asyncio.to_thread(
            self._client.put_object,
            self._bucket,
            object_key,
            data,
            length=len(file_bytes),
            content_type=content_type,
        )
        return object_key

    async def presigned_get_url(self, object_key: str, expires_minutes: int = 60) -> str:
        return await asyncio.to_thread(
            self._client.presigned_get_object,
            bucket_name=self._bucket,
            object_name=object_key,
            expires=timedelta(minutes=expires_minutes),
        )

    async def health_check(self) -> bool:
        try:
            return await asyncio.to_thread(
                self._client.bucket_exists, self._bucket
            )
        except Exception:
            return False

MinIO put_object signature (confirmed):

client.put_object(
    bucket_name: str,
    object_name: str,      # the object key
    data: io.RawIOBase,    # io.BytesIO is accepted
    length: int,           # -1 with part_size for unknown-length streams
    content_type: str = "application/octet-stream",
)

Note on length=-1: For unknown-length streams, set length=-1 and part_size=10*1024*1024. For in-memory io.BytesIO, always pass length=len(bytes) — this avoids a multipart upload when not needed.

Source: [CITED: github.com/minio/minio-py/blob/master/docs/API.md]

Pattern 4: MinIO Bucket Initialization at Startup

What: On first docker compose up, MinIO starts with an empty state. The application must create the docuvault bucket if it doesn't exist. This is done in the FastAPI lifespan, not in user request handlers.

Example:

# main.py lifespan extension
@asynccontextmanager
async def lifespan(app: FastAPI):
    # PostgreSQL engine + pool
    # MinIO bucket initialization
    minio_client = Minio(
        settings.minio_endpoint,
        access_key=settings.minio_access_key,
        secret_key=settings.minio_secret_key,
        secure=False,
    )
    exists = await asyncio.to_thread(minio_client.bucket_exists, settings.minio_bucket)
    if not exists:
        await asyncio.to_thread(minio_client.make_bucket, settings.minio_bucket)
    app.state.minio = minio_client
    yield
    await engine.dispose()

Pattern 5: Celery App + Redis Broker Configuration

What: A single celery_app.py module defines the Celery application. Tasks are defined as decorated functions. FastAPI route handlers call .delay() to enqueue; the celery-worker container processes them.

Redis URL format (with password, Docker internal network):

redis://:${REDIS_PASSWORD}@redis:6379/0

The : before the password with no username is the correct format when Redis is configured with requirepass but no ACL users. [CITED: docs.celeryq.dev/en/stable/getting-started/backends-and-brokers/redis.html via WebSearch]

Example:

# celery_app.py
import os
from celery import Celery

celery_app = Celery("docuvault")
celery_app.conf.broker_url = os.environ.get("REDIS_URL", "redis://redis:6379/0")
celery_app.conf.result_backend = os.environ.get("REDIS_URL", "redis://redis:6379/0")
celery_app.conf.task_serializer = "json"
celery_app.conf.result_serializer = "json"
celery_app.conf.accept_content = ["json"]
celery_app.conf.task_routes = {
    "tasks.document_tasks.*": {"queue": "documents"},
}

# tasks/document_tasks.py
from celery_app import celery_app

@celery_app.task(name="tasks.document_tasks.extract_and_classify")
def extract_and_classify(document_id: str) -> dict:
    # Celery tasks are SYNCHRONOUS functions — do NOT use async def here.
    # Use asyncio.run() sparingly or run sync equivalents of extractor/classifier.
    from services import extractor, classifier
    ...

# api/documents.py — calling the task
from tasks.document_tasks import extract_and_classify

@router.post("/upload")
async def upload_document(...):
    ...
    # Replace: background_tasks.add_task(classifier.classify_document, doc_id)
    # With:
    extract_and_classify.delay(str(saved_doc.id))
    return meta

Critical: Celery tasks are synchronous. The Celery worker runs a standard Python event loop (not asyncio). Calling async def functions inside a Celery task requires asyncio.run(), which creates a new event loop per task invocation. This is acceptable for Phase 1 since the existing extractor.py and classifier.py services already have sync and async entry points, but keep tasks pure-sync where possible. [VERIFIED via WebSearch cross-checked with official docs]

Worker startup command:

celery -A celery_app worker --loglevel=info -Q documents

Pattern 6: Docker Compose Health Checks + `depends_on`

What: Each infrastructure service has a healthcheck definition. The backend service uses depends_on: condition: service_healthy to wait for all three (postgres, minio, redis) before starting.

Example:

services:
  postgres:
    image: postgres:17-alpine
    environment:
      POSTGRES_DB: docuvault
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./docker/postgres/initdb.d:/docker-entrypoint-initdb.d:ro
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres -d docuvault"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 10s

  minio:
    image: minio/minio:latest
    command: server /data --console-address ":9001"
    environment:
      MINIO_ROOT_USER: ${MINIO_ROOT_USER}
      MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD}
    ports:
      - "9000:9000"
      - "9001:9001"
    volumes:
      - minio_data:/data
    healthcheck:
      # curl is removed from recent MinIO images; use the /minio/health/live HTTP endpoint
      # from the host. Inside the container, mc is available:
      test: ["CMD", "mc", "ready", "local"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 15s

  redis:
    image: redis:7-alpine
    command: redis-server --requirepass ${REDIS_PASSWORD}
    healthcheck:
      test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
      interval: 10s
      timeout: 3s
      retries: 5

  backend:
    depends_on:
      postgres:
        condition: service_healthy
      minio:
        condition: service_healthy
      redis:
        condition: service_healthy

MinIO healthcheck note: curl was removed from MinIO's Docker image in October 2023. The mc ready local command is the current recommended healthcheck inside the container. The /minio/health/live HTTP endpoint (returns 200 OK) is still valid for external probing but cannot be used inside the container without curl. [CITED: github.com/minio/minio/issues/18389]

Pattern 7: PostgreSQL Two-User Init Script

What: The official PostgreSQL Docker image runs scripts in /docker-entrypoint-initdb.d/ on first start (empty volume). A SQL script provisions two users: docuvault_migrate (DDL) and docuvault_app (runtime, restricted).

When to use: First docker compose up with a fresh volume. Idempotent for re-runs is not required — init scripts only run once.

Example:

-- docker/postgres/initdb.d/01-init-users.sql
-- Runs as the POSTGRES_USER superuser on first container start only.

-- Migration user: DDL privileges (CREATE TABLE, ALTER TABLE, CREATE INDEX)
CREATE USER docuvault_migrate WITH PASSWORD 'PLACEHOLDER_MIGRATE_PASSWORD';
GRANT ALL PRIVILEGES ON DATABASE docuvault TO docuvault_migrate;

-- App user: runtime DML only (SELECT, INSERT, UPDATE, DELETE)
CREATE USER docuvault_app WITH PASSWORD 'PLACEHOLDER_APP_PASSWORD';
GRANT CONNECT ON DATABASE docuvault TO docuvault_app;

-- Grant schema-level privileges AFTER migration user creates the schema
-- This must run after alembic upgrade head, OR grant in a second script.
-- Pattern: grant via a post-migration step or grant within the migration itself:
-- GRANT USAGE ON SCHEMA public TO docuvault_app;
-- GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO docuvault_app;
-- ALTER DEFAULT PRIVILEGES IN SCHEMA public
--   GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO docuvault_app;

Important: The GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES must be run AFTER Alembic has created the tables, because ON ALL TABLES applies only to existing tables. Use ALTER DEFAULT PRIVILEGES so future tables (from future migrations) are also accessible. This can be done at the end of the first Alembic migration file, or in a post-migration Docker entrypoint hook.

Recommended approach for Phase 1: Run the GRANT as the last step of the 0001_initial_schema.py migration using op.execute() as the docuvault_migrate user (which has full privileges). [ASSUMED — no official doc confirming this is the standard Alembic pattern, but it follows from standard PostgreSQL privilege management]

Pattern 8: StorageBackend ABC (Mirrors `ai/` Pattern)

What: storage/base.py defines StorageBackend as an abstract base class with the same structure as ai/base.py. storage/__init__.py provides a get_storage_backend() factory. storage/minio_backend.py is the Phase 1 implementation.

Example:

# storage/base.py
from abc import ABC, abstractmethod

class StorageBackend(ABC):
    @abstractmethod
    async def put_object(
        self, user_id: str, document_id: str,
        file_bytes: bytes, extension: str, content_type: str,
    ) -> str:
        """Store object; return the object_key used."""

    @abstractmethod
    async def get_object(self, object_key: str) -> bytes:
        """Retrieve object bytes by key."""

    @abstractmethod
    async def delete_object(self, object_key: str) -> None:
        """Delete object by key."""

    @abstractmethod
    async def presigned_get_url(self, object_key: str, expires_minutes: int = 60) -> str:
        """Return a time-limited download URL."""

    @abstractmethod
    async def health_check(self) -> bool:
        """Return True if backend is reachable."""

# storage/__init__.py
from config import settings
from storage.minio_backend import MinIOBackend

def get_storage_backend() -> StorageBackend:
    return MinIOBackend(
        endpoint=settings.minio_endpoint,
        access_key=settings.minio_access_key,
        secret_key=settings.minio_secret_key,
        bucket=settings.minio_bucket,
        secure=False,
    )

Anti-Patterns to Avoid

Sync SQLAlchemy in async context: Using create_engine() instead of create_async_engine() in FastAPI will block the event loop on every database call. Use create_async_engine throughout.
Calling await session.commit() then accessing lazy-loaded attributes: Always set expire_on_commit=False or explicitly refresh after commit.
Connecting Alembic using DATABASE_URL (restricted user): The restricted docuvault_app user has no DDL privileges. Alembic migrations will fail with permission denied errors. Alembic must always use DATABASE_MIGRATE_URL.
Using async def for Celery task functions: Celery workers do not run an asyncio event loop. Define tasks as def, not async def. Wrap any async calls with asyncio.run() if unavoidable, but prefer sync implementations in tasks.
Storing human-readable filename as MinIO object key: Object keys must be UUID-based ({user_id}/{document_id}/{uuid4()}{ext}). Filenames are stored ONLY in the documents.filename DB column. Putting human filenames in the key enables path traversal and makes key prediction trivial.
Using minio_client.bucket_exists() inside async handlers without asyncio.to_thread: The MinIO SDK is synchronous; calling it directly from async def will block the event loop.
MinIO mc ready local healthcheck with a password-protected Redis redis-cli ping: For Redis with requirepass, the healthcheck must pass -a $REDIS_PASSWORD to redis-cli. A bare redis-cli ping will return NOAUTH and be treated as unhealthy.

Don't Hand-Roll

Problem	Don't Build	Use Instead	Why
Async PostgreSQL session management	Custom connection/context manager	SQLAlchemy `async_sessionmaker` + `Depends(get_db)`	Handles connection pooling, transaction boundaries, error cleanup, and the `expire_on_commit` edge case
Database schema migrations	Manual `CREATE TABLE` scripts in Python	Alembic	Manages migration history, rollbacks, auto-generation from ORM models, and multi-environment DSN configuration
MinIO object lifecycle	Custom S3-like HTTP client	`minio` Python SDK	Handles multipart uploads, signature v4, presigned URL expiry, retry logic, and connection pooling
Background task distribution	Thread pools or `asyncio.create_task()`	Celery + Redis	Cross-instance task distribution, retry on failure, dead letter queues, task result storage
Docker service ordering	`sleep` commands in Compose entrypoints	`healthcheck` + `depends_on: condition: service_healthy`	Deterministic, declarative; `sleep` is a race condition
PostgreSQL privilege management	Per-table GRANT scripts written by hand	`ALTER DEFAULT PRIVILEGES` in Alembic migration	Future migrations automatically inherit privileges; hand-written grants go stale

Key insight: The existing filelock-based services/storage.py uses at least 6 custom concurrency primitives to solve problems that PostgreSQL's transaction isolation and MinIO's atomic object operations solve at the infrastructure level. The rewrite simplifies the code while gaining correctness guarantees.

Common Pitfalls

Pitfall 1: `expire_on_commit=True` (the default) Causes `MissingGreenlet`

What goes wrong: After await session.commit(), accessing any ORM object attribute triggers a new SELECT query. In async context, if there is no active session scope, SQLAlchemy raises sqlalchemy.exc.MissingGreenlet: greenlet_spawn has not been called.

Why it happens: The default Session.expire_on_commit=True marks objects as "expired" post-commit. The next attribute access triggers a lazy load, which needs a sync greenlet context (not available in asyncio).

How to avoid: Always set expire_on_commit=False in async_sessionmaker. [CITED: docs.sqlalchemy.org]

Warning signs: MissingGreenlet in tracebacks after commit; attribute access on model instances outside async with session blocks.

Pitfall 2: Alembic `env.py` Not Importing All Models

What goes wrong: alembic revision --autogenerate generates an empty migration even though models were defined.

Why it happens: Alembic's target_metadata must be set to Base.metadata, and all model modules must be imported BEFORE target_metadata is accessed in env.py. Python only knows about models that have been imported.

How to avoid: In migrations/env.py, explicitly import all model modules:

from db import models  # noqa: F401 — must import to register with Base.metadata
target_metadata = models.Base.metadata

Warning signs: Empty op. blocks in generated migrations; tables not appearing in migration history.

Pitfall 3: MinIO `put_object` Requires `io.BytesIO.seek(0)` Before Use

What goes wrong: put_object reads 0 bytes if the io.BytesIO object's file pointer is at the end (e.g., after writing to it).

Why it happens: io.BytesIO.write() advances the pointer to the end of the data. put_object starts reading from the current position.

How to avoid: Always call data.seek(0) before passing a BytesIO to put_object. Or construct the BytesIO from the complete bytes directly: io.BytesIO(file_bytes) starts the pointer at 0.

Warning signs: MinIO reports successful upload but object is 0 bytes; or OSError: stream having not enough data.

Pitfall 4: PostgreSQL Init Script GRANT Timing

What goes wrong: docuvault_app user gets permission denied on tables even after GRANT ... ON ALL TABLES.

Why it happens: GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public only applies to tables that exist at the time of the GRANT. Tables created by Alembic after the init script runs are not covered.

How to avoid: Run ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO docuvault_app; in the Alembic initial migration (as docuvault_migrate user, which owns the tables). This covers all future tables created by the same migration user.

Warning signs: First docker compose up works; second run after alembic upgrade head fails with 403 DB errors.

Pitfall 5: Redis Healthcheck Without Authentication

What goes wrong: redis-cli ping returns NOAUTH Authentication required when Redis is started with requirepass. Docker Compose treats non-zero exit as unhealthy. Backend never starts.

Why it happens: redis-cli ping without -a doesn't pass the password.

How to avoid: Use redis-cli -a ${REDIS_PASSWORD} ping in the healthcheck test field. Note that this logs a warning about passing password on command line — acceptable for a healthcheck, not for production scripts.

Warning signs: backend service stuck at Waiting for redis to be healthy; redis-cli ping showing NOAUTH in container logs.

Pitfall 6: MinIO `mc ready local` Healthcheck Not Available Without `mc`

What goes wrong: mc is present in the official minio/minio Docker image, so mc ready local works as a healthcheck. If using a third-party or stripped MinIO image, mc may be absent.

How to avoid: Stick to the official minio/minio:latest image. If a custom image is needed, use the /minio/health/live HTTP endpoint probed from a sidecar or from the host — not from inside the container without curl.

Pitfall 7: Celery Worker Cannot Import FastAPI App Module

What goes wrong: Celery worker Docker container imports celery_app.py, which transitively imports the FastAPI app or lifespan, which tries to open database connections or access app.state.

Why it happens: Shared imports between the FastAPI app and Celery tasks create circular dependencies at module load time.

How to avoid: Keep celery_app.py minimal (Celery configuration only). Task functions in tasks/ import services directly, not via main.py or any router. The Celery worker starts with celery -A celery_app worker — it never starts FastAPI.

Code Examples

Full v1 SQLAlchemy ORM Schema (Phase 1 Migration Target)

# db/models.py
import uuid
from datetime import datetime, timezone
from sqlalchemy import (
    Boolean, BigInteger, ForeignKey, Index, String, Text,
    TIMESTAMP, UniqueConstraint, Integer
)
from sqlalchemy.dialects.postgresql import UUID, INET, JSONB
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship
from sqlalchemy.sql import func

def now_utc():
    return datetime.now(timezone.utc)

class Base(DeclarativeBase):
    pass

class User(Base):
    __tablename__ = "users"
    id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    handle: Mapped[str] = mapped_column(String, unique=True, nullable=False)
    email: Mapped[str] = mapped_column(String, unique=True, nullable=False)
    password_hash: Mapped[str] = mapped_column(Text, nullable=False)
    totp_secret: Mapped[str | None] = mapped_column(Text, nullable=True)
    totp_enabled: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
    role: Mapped[str] = mapped_column(String, nullable=False, default="user")
    is_active: Mapped[bool] = mapped_column(Boolean, nullable=False, default=True)
    ai_provider: Mapped[str | None] = mapped_column(Text, nullable=True)
    ai_model: Mapped[str | None] = mapped_column(Text, nullable=True)
    default_storage_backend: Mapped[str] = mapped_column(String, nullable=False, default="minio")
    created_at: Mapped[datetime] = mapped_column(TIMESTAMP(timezone=True), nullable=False, server_default=func.now())

class Quota(Base):
    __tablename__ = "quotas"
    user_id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), primary_key=True)
    limit_bytes: Mapped[int] = mapped_column(BigInteger, nullable=False, default=104857600)  # 100 MB
    used_bytes: Mapped[int] = mapped_column(BigInteger, nullable=False, default=0)

class RefreshToken(Base):
    __tablename__ = "refresh_tokens"
    id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    user_id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=False)
    token_hash: Mapped[str] = mapped_column(Text, unique=True, nullable=False)
    expires_at: Mapped[datetime] = mapped_column(TIMESTAMP(timezone=True), nullable=False)
    revoked: Mapped[bool] = mapped_column(Boolean, nullable=False, default=False)
    created_at: Mapped[datetime] = mapped_column(TIMESTAMP(timezone=True), nullable=False, server_default=func.now())
    __table_args__ = (Index("ix_refresh_tokens_user_revoked", "user_id", "revoked"),)

class Folder(Base):
    __tablename__ = "folders"
    id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    user_id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=False)
    parent_id: Mapped[uuid.UUID | None] = mapped_column(UUID(as_uuid=True), ForeignKey("folders.id", ondelete="CASCADE"), nullable=True)
    name: Mapped[str] = mapped_column(Text, nullable=False)
    created_at: Mapped[datetime] = mapped_column(TIMESTAMP(timezone=True), nullable=False, server_default=func.now())
    __table_args__ = (UniqueConstraint("user_id", "parent_id", "name"),)

class Document(Base):
    __tablename__ = "documents"
    id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    # user_id is NULLABLE in Phase 1 (D-03); Phase 2 migration adds NOT NULL
    user_id: Mapped[uuid.UUID | None] = mapped_column(UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=True)
    folder_id: Mapped[uuid.UUID | None] = mapped_column(UUID(as_uuid=True), ForeignKey("folders.id", ondelete="SET NULL"), nullable=True)
    filename: Mapped[str] = mapped_column(Text, nullable=False)      # original human-readable name
    object_key: Mapped[str] = mapped_column(Text, nullable=False)    # MinIO key: {user_id}/{doc_id}/{uuid4}{ext}
    content_type: Mapped[str] = mapped_column(Text, nullable=False)
    size_bytes: Mapped[int] = mapped_column(BigInteger, nullable=False, default=0)
    storage_backend: Mapped[str] = mapped_column(String, nullable=False, default="minio")
    extracted_text: Mapped[str | None] = mapped_column(Text, nullable=True)
    status: Mapped[str] = mapped_column(String, nullable=False, default="pending")
    created_at: Mapped[datetime] = mapped_column(TIMESTAMP(timezone=True), nullable=False, server_default=func.now())
    updated_at: Mapped[datetime] = mapped_column(TIMESTAMP(timezone=True), nullable=False, server_default=func.now())
    __table_args__ = (
        Index("ix_documents_user_folder", "user_id", "folder_id"),
        Index("ix_documents_user_created", "user_id", "created_at"),
    )

class Topic(Base):
    __tablename__ = "topics"
    id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    user_id: Mapped[uuid.UUID | None] = mapped_column(UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=True)
    name: Mapped[str] = mapped_column(Text, nullable=False)
    description: Mapped[str] = mapped_column(Text, nullable=False, default="")
    color: Mapped[str] = mapped_column(String(7), nullable=False, default="#6366f1")
    __table_args__ = (UniqueConstraint("user_id", "name"),)

class DocumentTopic(Base):
    __tablename__ = "document_topics"
    document_id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), ForeignKey("documents.id", ondelete="CASCADE"), primary_key=True)
    topic_id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), ForeignKey("topics.id", ondelete="CASCADE"), primary_key=True)

class Share(Base):
    __tablename__ = "shares"
    id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    document_id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), ForeignKey("documents.id", ondelete="CASCADE"), nullable=False)
    owner_id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=False)
    recipient_id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=False)
    permission: Mapped[str] = mapped_column(String, nullable=False, default="view")
    created_at: Mapped[datetime] = mapped_column(TIMESTAMP(timezone=True), nullable=False, server_default=func.now())
    __table_args__ = (
        UniqueConstraint("document_id", "recipient_id"),
        Index("ix_shares_recipient", "recipient_id"),
    )

class AuditLog(Base):
    __tablename__ = "audit_log"
    id: Mapped[int] = mapped_column(Integer, primary_key=True, autoincrement=True)
    user_id: Mapped[uuid.UUID | None] = mapped_column(UUID(as_uuid=True), ForeignKey("users.id", ondelete="SET NULL"), nullable=True)
    actor_id: Mapped[uuid.UUID | None] = mapped_column(UUID(as_uuid=True), ForeignKey("users.id", ondelete="SET NULL"), nullable=True)
    event_type: Mapped[str] = mapped_column(Text, nullable=False)
    resource_id: Mapped[uuid.UUID | None] = mapped_column(UUID(as_uuid=True), nullable=True)
    ip_address: Mapped[str | None] = mapped_column(INET, nullable=True)
    metadata: Mapped[dict | None] = mapped_column(JSONB, nullable=True)
    created_at: Mapped[datetime] = mapped_column(TIMESTAMP(timezone=True), nullable=False, server_default=func.now())
    __table_args__ = (
        Index("ix_audit_user_created", "user_id", "created_at"),
        Index("ix_audit_event_created", "event_type", "created_at"),
    )

class CloudConnection(Base):
    __tablename__ = "cloud_connections"
    id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    user_id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), ForeignKey("users.id", ondelete="CASCADE"), nullable=False)
    provider: Mapped[str] = mapped_column(String, nullable=False)
    display_name: Mapped[str] = mapped_column(Text, nullable=False)
    credentials_enc: Mapped[str] = mapped_column(Text, nullable=False)
    status: Mapped[str] = mapped_column(String, nullable=False, default="ACTIVE")
    connected_at: Mapped[datetime] = mapped_column(TIMESTAMP(timezone=True), nullable=False, server_default=func.now())
    __table_args__ = (Index("ix_cloud_connections_user", "user_id"),)

class Group(Base):
    """v2 stub — empty table, seeded for schema completeness (PROJECT.md)."""
    __tablename__ = "groups"
    id: Mapped[uuid.UUID] = mapped_column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    name: Mapped[str] = mapped_column(Text, unique=True, nullable=False)
    created_at: Mapped[datetime] = mapped_column(TIMESTAMP(timezone=True), nullable=False, server_default=func.now())

Config Extension for New Env Vars

# config.py (extended)
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    # Existing
    data_dir: str = "/app/data"

    # Phase 1 additions
    database_url: str = "postgresql+psycopg://docuvault_app:changeme@postgres/docuvault"
    database_migrate_url: str = "postgresql+psycopg://docuvault_migrate:changeme@postgres/docuvault"
    minio_endpoint: str = "minio:9000"
    minio_access_key: str = "docuvault_app"
    minio_secret_key: str = "changeme"
    minio_bucket: str = "docuvault"
    redis_url: str = "redis://:changeme@redis:6379/0"
    secret_key: str = "CHANGEME"  # documented for Phase 2; not read in Phase 1

    class Config:
        env_file = ".env"
        env_file_encoding = "utf-8"

settings = Settings()

State of the Art

Old Approach	Current Approach	When Changed	Impact
`asyncpg` as the only async PostgreSQL dialect	`psycopg` v3 supports both sync + async via one package	2022 (psycopg v3 release)	Single driver for Alembic + FastAPI; no separate sync/async packages
`alembic init` (sync template)	`alembic init -t async` for async engine migrations	Alembic 1.7+	env.py template pre-configured for asyncio; no manual async wiring
`async_sessionmaker` equivalent was `sessionmaker` with separate import	`async_sessionmaker` is a first-class API in SQLAlchemy 2.0	SQLAlchemy 2.0 (2023)	Cleaner factory pattern without subclassing
MinIO Docker image included `curl` for healthchecks	`curl` removed from image; `mc ready local` is the new healthcheck	October 2023	Existing tutorials with `curl -f` healthcheck will silently fail on current images
`FastAPI BackgroundTasks` for async post-request work	Celery + Redis for distributed, reliable task queues	Ongoing	`BackgroundTasks` is per-instance and has no retry; Celery is cross-instance

Deprecated/outdated:

filelock dependency: can be removed from backend/requirements.txt once services/storage.py is replaced (CONCERNS.md item 14 identifies the unused shutil import; same cleanup applies to filelock).
Per-document .lock files in data/metadata/: deleted with data/ directory contents (D-04).
psycopg2 (old driver): not installed and not needed; psycopg v3 is the replacement.
Sync file I/O in async handlers (CONCERNS.md item 6): resolved entirely by switching to async SQLAlchemy.

Assumptions Log

#	Claim	Section	Risk if Wrong
A1	Running `GRANT ... ON ALL TABLES` inside the Alembic initial migration as `docuvault_migrate` is the standard pattern for privilege handoff to `docuvault_app`	Pattern 7 (PostgreSQL init script)	If the migration user lacks permission to GRANT to another user, privileges must be set manually or via a separate script — delays testing
A2	The Celery worker container can import `db/models.py` and `services/` directly without starting FastAPI (no circular import)	Pattern 5 (Celery)	If service modules import FastAPI components at module level, a refactor is needed before worker tasks can import services
A3	`minio/minio:latest` Docker image includes `mc` for the `mc ready local` healthcheck	Pattern 6 (Docker Compose)	If `mc` is not in the image, healthcheck must use a shell-based TCP probe or alternative; confirmed via GitHub issue discussion [CITED: github.com/minio/minio/issues/18389] but version-specific

Open Questions

PostgreSQL version to pin in Docker Compose
- What we know: Any PostgreSQL 14+ supports gen_random_uuid(), JSONB, INET, and TIMESTAMPTZ used in the schema.
- What's unclear: Whether to use postgres:16, postgres:17, or postgres:17-alpine.
- Recommendation: Use postgres:17-alpine (smallest image, current stable, alpine is well-suited for Docker Compose dev setups).
MinIO version pinning
- What we know: minio/minio:latest has mc available for healthchecks; curl was removed in late 2023.
- What's unclear: Whether to pin to a specific release tag (e.g., RELEASE.2025-09-07T16-13-09Z) or use :latest.
- Recommendation: Pin to a specific RELEASE tag for reproducibility; update as part of a maintenance task. [ASSUMED — no strong official guidance on whether :latest is appropriate for production-adjacent Docker Compose]
Topics table migration: existing topic names from data/topics.json
- What we know: D-04 deletes data/ contents. Topics stored in topics.json are test data and are deleted.
- What's unclear: The existing api/topics.py and frontend/src/stores/topics.js need updating to read from PostgreSQL instead of the flat file. The API shape should remain the same (list of objects with id, name, description, color).
- Recommendation: The planner must include a task for updating api/topics.py to use async SQLAlchemy ORM queries against the topics table.
Celery task vs direct service call for text extraction + classification
- What we know: The current api/documents.py calls await classifier.classify_document() inside the route handler. This needs to move to a Celery task.
- What's unclear: Whether Phase 1 should move ALL of extraction + classification into a Celery task (full async flow) or just wire up the infrastructure with a placeholder task and migrate the logic in Phase 3.
- Recommendation: Phase 1 should wire the full task (extract + classify) in Celery — the walking skeleton requirement says "AI classification workflow completes successfully." A placeholder task that doesn't classify would fail the success criteria.

Environment Availability

Dependency	Required By	Available	Version	Fallback
Docker	Docker Compose services	✓	29.5.0	—
Python 3.12	Backend (in Docker image)	✓ (host: 3.14.5; Docker: 3.12 pinned)	3.12 in image	—
PostgreSQL (via Docker)	Database tier	✓ (via Docker)	17 (image)	—
MinIO (via Docker)	Object storage	✓ (via Docker)	latest	—
Redis (via Docker)	Celery broker, Phase 2 rate limiting	✓ (via Docker)	7-alpine	—
pytest	Backend test runner	✓ (host pip3)	existing	—

Missing dependencies with no fallback: None. Missing dependencies with fallback: None.

Validation Architecture

Test Framework

Property	Value
Framework	pytest with pytest-asyncio (existing)
Config file	`backend/pytest.ini` (existing; `asyncio_mode = auto`)
Quick run command	`cd backend && pytest tests/test_health.py tests/test_documents.py tests/test_storage.py -x`
Full suite command	`cd backend && pytest -v`

Phase Requirements → Test Map

Req ID	Behavior	Test Type	Automated Command	File Exists?
STORE-01	Upload stores metadata in PostgreSQL and bytes in MinIO	integration	`pytest tests/test_documents.py::test_upload_stores_to_postgres_and_minio -x`	❌ Wave 0
STORE-01	List documents reads from PostgreSQL (not filesystem)	integration	`pytest tests/test_documents.py::test_list_reads_from_db -x`	❌ Wave 0
STORE-02	MinIO object key matches `{user_id}/{document_id}/{uuid4}{ext}` pattern	unit	`pytest tests/test_storage.py::test_object_key_schema -x`	❌ Wave 0
STORE-02	Human-readable filename is NOT in the object key	unit	`pytest tests/test_storage.py::test_filename_not_in_object_key -x`	❌ Wave 0
STORE-07	`/health` returns PostgreSQL + MinIO connectivity (not just `{"status": "ok"}`)	smoke	`pytest tests/test_health.py::test_health_checks_postgres_and_minio -x`	❌ Wave 0
STORE-07 (implicit)	Storage service has no file locks; concurrent uploads do not corrupt state	integration	`pytest tests/test_documents.py::test_concurrent_uploads -x`	❌ Wave 0

Sampling Rate

Per task commit: cd backend && pytest tests/test_health.py tests/test_storage.py -x
Per wave merge: cd backend && pytest -v
Phase gate: Full suite green before /gsd:verify-work

Wave 0 Gaps

tests/test_storage.py — covers STORE-02 (object key schema, filename isolation)
tests/test_documents.py — extend for PostgreSQL/MinIO-backed upload/list (STORE-01)
tests/test_health.py — extend for PostgreSQL + MinIO connectivity probes (STORE-07)
tests/conftest.py — add async engine + session fixtures; add MinIO mock or test bucket fixture
Update tests/conftest.py to monkeypatch db/session.py paths (not just config.py paths)

Existing tests: test_documents.py, test_topics.py, test_settings.py test the OLD flat-file storage layer. They will break after services/storage.py is replaced. These must be ported (not deleted) as part of Phase 1.

Security Domain

Applicable ASVS Categories

ASVS Category	Applies	Standard Control
V2 Authentication	No — Phase 1 has no auth	Phase 2
V3 Session Management	No — Phase 1 has no sessions	Phase 2
V4 Access Control	Partial — object key isolation in MinIO backend	`user_id` prefix enforced in `MinIOBackend.put_object()`
V5 Input Validation	Yes — file upload content type + size	Existing `ALLOWED_MIME_TYPES` enforcement (currently unenforced per CONCERNS.md item 1)
V6 Cryptography	No — Phase 1 has no credential encryption	Phase 5

Known Threat Patterns for This Phase

Pattern	STRIDE	Standard Mitigation
Object key prediction / path traversal	Tampering	UUID-based object keys (`{user_id}/{document_id}/{uuid4}{ext}`); never accept object keys from request parameters
Database superuser credentials in app DSN	Elevation of Privilege	Two-DSN pattern: `docuvault_app` (restricted) for runtime, `docuvault_migrate` (DDL) for Alembic only
MinIO credentials with bucket admin rights	Elevation of Privilege	App-level access key pair (`MINIO_ACCESS_KEY` / `MINIO_SECRET_KEY`) with read/write on `docuvault` bucket only; root credentials not used by app
Redis unauthenticated in Docker network	Information Disclosure	`requirepass` set on Redis; `REDIS_URL` includes password; Celery broker and app use authenticated URL
SQL injection via ORM	Tampering	SQLAlchemy ORM / parameterized queries throughout; zero raw string interpolation (matches CLAUDE.md SEC-03)
Sensitive data in MinIO object key	Information Disclosure	Human-readable filenames stored in DB only; object key is UUID-based and non-predictable

Sources

Primary (HIGH confidence)

docs.sqlalchemy.org/en/20/orm/extensions/asyncio.html — async engine setup, async_sessionmaker, expire_on_commit=False, FastAPI lifespan integration
alembic.sqlalchemy.org/en/latest/cookbook.html#using-asyncio-with-alembic — async env.py pattern
github.com/sqlalchemy/alembic/blob/main/alembic/templates/async/env.py — official async env.py template code
github.com/minio/minio-py/blob/master/docs/API.md — put_object, presigned_get_object, constructor signatures
github.com/minio/minio/issues/18389 — curl removal from MinIO image; mc ready local as replacement
docs.min.io/enterprise/aistor-object-store/operations/monitoring/healthcheck-probe/ — /minio/health/live endpoint documented
docs.docker.com/reference/compose-file/services/#healthcheck — healthcheck + depends_on: condition: service_healthy syntax

Secondary (MEDIUM confidence)

docs.celeryq.dev/en/stable/getting-started/backends-and-brokers/redis.html — Redis URL format verified via WebSearch; Celery docs site was unreachable during research session
testdriven.io/blog/fastapi-and-celery/ — Celery + FastAPI project structure and .delay() pattern
WebSearch results cross-referenced with official docs for psycopg install extras, Redis broker URL format, PostgreSQL init script pattern

Tertiary (LOW confidence)

None — all key claims cross-verified with at least one authoritative source

Metadata

Confidence breakdown:

Standard stack: HIGH — all packages verified on PyPI via pip3 index versions, slopcheck [OK] for all 6 core packages
Architecture: HIGH — patterns drawn from SQLAlchemy official docs, Alembic official template, and MinIO official GitHub
Pitfalls: HIGH — each pitfall sourced from official documentation or confirmed GitHub issues (not community blog posts only)
Celery configuration: MEDIUM — Celery docs site was unreachable; URL format cross-verified via WebSearch + community sources

Research date: 2026-05-21 Valid until: 2026-06-21 for stable stack; MinIO healthcheck pattern should be re-verified if the Docker image version changes significantly

61 KiB Raw Blame History

Phase 1: Infrastructure Foundation - Research

User Constraints (from CONTEXT.md)

Locked Decisions

Claude's Discretion

Deferred Ideas (OUT OF SCOPE)

Phase Requirements

Summary

Architectural Responsibility Map

Standard Stack

Core

Supporting

Alternatives Considered

Package Legitimacy Audit

Architecture Patterns

System Architecture Diagram

Recommended Project Structure

Pattern 1: SQLAlchemy 2.0 Async Engine + Session Factory (FastAPI Lifespan)

Pattern 2: Alembic Async Configuration with Two DSNs

Pattern 3: MinIO SDK Sync-in-Async via asyncio.to_thread()

Pattern 4: MinIO Bucket Initialization at Startup

Pattern 5: Celery App + Redis Broker Configuration

Pattern 6: Docker Compose Health Checks + depends_on

Pattern 7: PostgreSQL Two-User Init Script

Pattern 8: StorageBackend ABC (Mirrors ai/ Pattern)

Anti-Patterns to Avoid

Don't Hand-Roll

Common Pitfalls

Pitfall 1: expire_on_commit=True (the default) Causes MissingGreenlet

Pitfall 2: Alembic env.py Not Importing All Models

Pitfall 3: MinIO put_object Requires io.BytesIO.seek(0) Before Use

Pitfall 4: PostgreSQL Init Script GRANT Timing

Pitfall 5: Redis Healthcheck Without Authentication

Pitfall 6: MinIO mc ready local Healthcheck Not Available Without mc

Pitfall 7: Celery Worker Cannot Import FastAPI App Module

Code Examples

Full v1 SQLAlchemy ORM Schema (Phase 1 Migration Target)

Config Extension for New Env Vars

State of the Art

Assumptions Log

Open Questions

Environment Availability

Validation Architecture

Test Framework

Phase Requirements → Test Map

Sampling Rate

Wave 0 Gaps

Security Domain

Applicable ASVS Categories

Known Threat Patterns for This Phase

Sources

Primary (HIGH confidence)

Secondary (MEDIUM confidence)

Tertiary (LOW confidence)

Metadata

61 KiB

Raw Blame History

Pattern 3: MinIO SDK Sync-in-Async via `asyncio.to_thread()`

Pattern 6: Docker Compose Health Checks + `depends_on`

Pattern 8: StorageBackend ABC (Mirrors `ai/` Pattern)

Pitfall 1: `expire_on_commit=True` (the default) Causes `MissingGreenlet`

Pitfall 2: Alembic `env.py` Not Importing All Models

Pitfall 3: MinIO `put_object` Requires `io.BytesIO.seek(0)` Before Use

Pitfall 6: MinIO `mc ready local` Healthcheck Not Available Without `mc`