# Phase 1: Infrastructure Foundation - Context **Gathered:** 2026-05-21 **Status:** Ready for planning ## Phase Boundary Wire PostgreSQL, MinIO, Redis, and Celery into Docker Compose; create a complete Alembic-managed schema; rewrite the document storage service layer to use PostgreSQL + MinIO; delete legacy flat-file data. The existing document upload and AI classification workflow must continue to work end-to-end with the new infrastructure. No user-facing behavior change, but the internal storage layer is fully replaced. ## Implementation Decisions ### Schema Scope - **D-01:** Phase 1 initial Alembic migration creates the **full v1 skeleton** — all tables: `users`, `refresh_tokens`, `quotas`, `documents`, `topics`, `folders`, `shares`, `audit_log`, `cloud_connections`. Subsequent phases add data and constraints, not new tables. - **D-02:** `groups` table stub included in Phase 1 migration (v2 feature, but PROJECT.md explicitly notes "groups table will be seeded in schema"). Empty table, correct columns and FKs. - **D-03:** `documents.user_id` is **nullable** in Phase 1 (no auth system yet). Phase 2 migration adds the `NOT NULL` constraint after the user/auth system is live. - **D-04:** Existing `data/` directory contents (flat-file JSON metadata + uploaded files) are **deleted** in Phase 1. They are test data only — no migration script needed. Phase 3's migration scope is removed. ### App Wiring - **D-05:** Phase 1 **switches the storage service layer** to PostgreSQL + MinIO. `backend/services/storage.py` is rewritten to use async SQLAlchemy + MinIO SDK. The app does not continue using the filesystem after Phase 1. - **D-06:** Single MinIO bucket named **`docuvault`**. Object keys follow `{user_id}/{document_id}/{uuid4()}{ext}` (STORE-02). Human-readable filenames stored in the `documents.filename` DB column only — never in the MinIO key. - **D-07:** `backend/main.py` `/health` endpoint extended to check PostgreSQL + MinIO connectivity (not just `{"status": "ok"}`). Health checks gate `docker compose up` readiness. ### Background Worker - **D-08:** Background task queue: **Celery + Redis** (STORE-08). FastAPI `BackgroundTasks` replaced. - **D-09:** Redis service added to `docker-compose.yml` in **Phase 1** (alongside PostgreSQL and MinIO). Redis doubles as the rate-limiting store for Phase 2 auth endpoints — no second Redis needed later. - **D-10:** A `celery-worker` service is added to `docker-compose.yml`. Celery broker and result backend both point to the same Redis instance via `REDIS_URL`. ### Env / Secrets Strategy - **D-11:** `.env` gitignored + `.env.example` committed. `docker-compose.yml` reads vars via `${VAR_NAME}`. `.env.example` has safe placeholder values and comments explaining each variable. - **D-12:** Production secrets stored **outside the project directory** at `/etc/docuvault/env` (`chmod 600`, owned by the service user, not root). `docker-compose.yml` references it via `env_file:`. Documented in deployment notes. - **D-13:** **Two PostgreSQL DSNs** introduced: - `DATABASE_URL` — restricted app user `docuvault_app` (SELECT / INSERT / UPDATE / DELETE only; no DDL) - `DATABASE_MIGRATE_URL` — migration user `docuvault_migrate` (DDL privileges; used only by Alembic) - **D-14:** PostgreSQL init script in `docker/postgres/initdb.d/` provisions both users on first container start. The app never connects as the PostgreSQL superuser. - **D-15:** MinIO vars: `MINIO_ENDPOINT`, `MINIO_ROOT_USER`, `MINIO_ROOT_PASSWORD` (init only), `MINIO_BUCKET` (value: `docuvault`), `MINIO_ACCESS_KEY`, `MINIO_SECRET_KEY` (separate app-level access key pair with minimal bucket permissions). - **D-16:** Additional vars in Phase 1 `.env.example`: `REDIS_URL`, `SECRET_KEY` (documented now for Phase 2 JWT + HKDF use; app does not read it in Phase 1). ## Canonical References **Downstream agents MUST read these before planning or implementing.** ### Requirements - `.planning/REQUIREMENTS.md` — STORE-01 (storage migration), STORE-02 (MinIO key schema), STORE-07 (stateless backend), STORE-08 (BackgroundTasks replacement) - `.planning/ROADMAP.md` — Phase 1 goal, success criteria (especially criterion #4: MinIO key schema enforced in model layer) ### Project Decisions - `.planning/PROJECT.md` — Key Decisions table (PostgreSQL + MinIO rationale, HKDF key derivation, atomic quota UPDATE pattern, groups table seeding) - `.planning/STATE.md` — Open Questions (Celery+Redis now resolved as Celery+Redis; PyOTP valid_window note for Phase 2) ## Existing Code Insights ### Reusable Assets - `backend/ai/base.py` + `backend/ai/__init__.py` — ABC + factory pattern; the `storage/` module for MinIO/cloud backends should mirror this exactly (StorageBackend ABC + get_storage_backend() factory) - `backend/config.py` — Pydantic Settings class; extend with new env vars (DATABASE_URL, DATABASE_MIGRATE_URL, MINIO_*, REDIS_URL, SECRET_KEY) - `backend/main.py` `/health` — extend to probe PostgreSQL + MinIO rather than replace ### Established Patterns - **Provider pattern** (ABC + factory in `ai/`) — storage module mirrors this - **Service layer** (`services/extractor.py`, `services/classifier.py`) — pure Python modules, no FastAPI coupling; new storage service follows the same boundary - **Pinia-as-facade** (frontend) — no changes needed in Phase 1; API contract preserved ### Integration Points - `docker-compose.yml` — add `postgres`, `minio`, `redis`, `celery-worker` services; add health checks and `depends_on` ordering - `backend/requirements.txt` — add: `sqlalchemy[asyncio]>=2.0`, `psycopg[binary,pool]>=3`, `alembic`, `minio`, `celery[redis]` - `backend/services/storage.py` — **replace entirely** with async SQLAlchemy + MinIO SDK implementation - `backend/api/documents.py` — update to use new async storage service; interface should stay stable so frontend is unaffected - `backend/main.py` — add SQLAlchemy async engine lifespan (startup/shutdown); extend `/health` ### Constraints - All CORS origins are currently `["*"]` — leave as-is for Phase 1; Phase 2 locks this down with auth - No linter/formatter config in repo — don't introduce one in Phase 1 (out of scope) - `filelock` dependency can be removed once `services/storage.py` is replaced ## Specific Ideas - MinIO bucket name: `docuvault` (single bucket, prefix-based isolation) - PostgreSQL users: `docuvault_app` (runtime, restricted) + `docuvault_migrate` (Alembic migrations, DDL) - Init script path: `docker/postgres/initdb.d/01-init-users.sql` - Production env file location: `/etc/docuvault/env` (outside project dir, `chmod 600`) - Redis URL format: `redis://:${REDIS_PASSWORD}@redis:6379/0` (password-protected even in dev) ## Deferred Ideas None — discussion stayed within phase scope. --- *Phase: 1-Infrastructure Foundation* *Context gathered: 2026-05-21*