6.9 KiB
6.9 KiB
Phase 1: Infrastructure Foundation - Context
Gathered: 2026-05-21 Status: Ready for planning
## Phase BoundaryWire PostgreSQL, MinIO, Redis, and Celery into Docker Compose; create a complete Alembic-managed schema; rewrite the document storage service layer to use PostgreSQL + MinIO; delete legacy flat-file data. The existing document upload and AI classification workflow must continue to work end-to-end with the new infrastructure. No user-facing behavior change, but the internal storage layer is fully replaced.
## Implementation DecisionsSchema Scope
- D-01: Phase 1 initial Alembic migration creates the full v1 skeleton — all tables:
users,refresh_tokens,quotas,documents,topics,folders,shares,audit_log,cloud_connections. Subsequent phases add data and constraints, not new tables. - D-02:
groupstable stub included in Phase 1 migration (v2 feature, but PROJECT.md explicitly notes "groups table will be seeded in schema"). Empty table, correct columns and FKs. - D-03:
documents.user_idis nullable in Phase 1 (no auth system yet). Phase 2 migration adds theNOT NULLconstraint after the user/auth system is live. - D-04: Existing
data/directory contents (flat-file JSON metadata + uploaded files) are deleted in Phase 1. They are test data only — no migration script needed. Phase 3's migration scope is removed.
App Wiring
- D-05: Phase 1 switches the storage service layer to PostgreSQL + MinIO.
backend/services/storage.pyis rewritten to use async SQLAlchemy + MinIO SDK. The app does not continue using the filesystem after Phase 1. - D-06: Single MinIO bucket named
docuvault. Object keys follow{user_id}/{document_id}/{uuid4()}{ext}(STORE-02). Human-readable filenames stored in thedocuments.filenameDB column only — never in the MinIO key. - D-07:
backend/main.py/healthendpoint extended to check PostgreSQL + MinIO connectivity (not just{"status": "ok"}). Health checks gatedocker compose upreadiness.
Background Worker
- D-08: Background task queue: Celery + Redis (STORE-08). FastAPI
BackgroundTasksreplaced. - D-09: Redis service added to
docker-compose.ymlin Phase 1 (alongside PostgreSQL and MinIO). Redis doubles as the rate-limiting store for Phase 2 auth endpoints — no second Redis needed later. - D-10: A
celery-workerservice is added todocker-compose.yml. Celery broker and result backend both point to the same Redis instance viaREDIS_URL.
Env / Secrets Strategy
- D-11:
.envgitignored +.env.examplecommitted.docker-compose.ymlreads vars via${VAR_NAME}..env.examplehas safe placeholder values and comments explaining each variable. - D-12: Production secrets stored outside the project directory at
/etc/docuvault/env(chmod 600, owned by the service user, not root).docker-compose.ymlreferences it viaenv_file:. Documented in deployment notes. - D-13: Two PostgreSQL DSNs introduced:
DATABASE_URL— restricted app userdocuvault_app(SELECT / INSERT / UPDATE / DELETE only; no DDL)DATABASE_MIGRATE_URL— migration userdocuvault_migrate(DDL privileges; used only by Alembic)
- D-14: PostgreSQL init script in
docker/postgres/initdb.d/provisions both users on first container start. The app never connects as the PostgreSQL superuser. - D-15: MinIO vars:
MINIO_ENDPOINT,MINIO_ROOT_USER,MINIO_ROOT_PASSWORD(init only),MINIO_BUCKET(value:docuvault),MINIO_ACCESS_KEY,MINIO_SECRET_KEY(separate app-level access key pair with minimal bucket permissions). - D-16: Additional vars in Phase 1
.env.example:REDIS_URL,SECRET_KEY(documented now for Phase 2 JWT + HKDF use; app does not read it in Phase 1).
<canonical_refs>
Canonical References
Downstream agents MUST read these before planning or implementing.
Requirements
.planning/REQUIREMENTS.md— STORE-01 (storage migration), STORE-02 (MinIO key schema), STORE-07 (stateless backend), STORE-08 (BackgroundTasks replacement).planning/ROADMAP.md— Phase 1 goal, success criteria (especially criterion #4: MinIO key schema enforced in model layer)
Project Decisions
.planning/PROJECT.md— Key Decisions table (PostgreSQL + MinIO rationale, HKDF key derivation, atomic quota UPDATE pattern, groups table seeding).planning/STATE.md— Open Questions (Celery+Redis now resolved as Celery+Redis; PyOTP valid_window note for Phase 2)
</canonical_refs>
<code_context>
Existing Code Insights
Reusable Assets
backend/ai/base.py+backend/ai/__init__.py— ABC + factory pattern; thestorage/module for MinIO/cloud backends should mirror this exactly (StorageBackend ABC + get_storage_backend() factory)backend/config.py— Pydantic Settings class; extend with new env vars (DATABASE_URL, DATABASE_MIGRATE_URL, MINIO_*, REDIS_URL, SECRET_KEY)backend/main.py/health— extend to probe PostgreSQL + MinIO rather than replace
Established Patterns
- Provider pattern (ABC + factory in
ai/) — storage module mirrors this - Service layer (
services/extractor.py,services/classifier.py) — pure Python modules, no FastAPI coupling; new storage service follows the same boundary - Pinia-as-facade (frontend) — no changes needed in Phase 1; API contract preserved
Integration Points
docker-compose.yml— addpostgres,minio,redis,celery-workerservices; add health checks anddepends_onorderingbackend/requirements.txt— add:sqlalchemy[asyncio]>=2.0,psycopg[binary,pool]>=3,alembic,minio,celery[redis]backend/services/storage.py— replace entirely with async SQLAlchemy + MinIO SDK implementationbackend/api/documents.py— update to use new async storage service; interface should stay stable so frontend is unaffectedbackend/main.py— add SQLAlchemy async engine lifespan (startup/shutdown); extend/health
Constraints
- All CORS origins are currently
["*"]— leave as-is for Phase 1; Phase 2 locks this down with auth - No linter/formatter config in repo — don't introduce one in Phase 1 (out of scope)
filelockdependency can be removed onceservices/storage.pyis replaced
</code_context>
## Specific Ideas- MinIO bucket name:
docuvault(single bucket, prefix-based isolation) - PostgreSQL users:
docuvault_app(runtime, restricted) +docuvault_migrate(Alembic migrations, DDL) - Init script path:
docker/postgres/initdb.d/01-init-users.sql - Production env file location:
/etc/docuvault/env(outside project dir,chmod 600) - Redis URL format:
redis://:${REDIS_PASSWORD}@redis:6379/0(password-protected even in dev)
None — discussion stayed within phase scope.
Phase: 1-Infrastructure Foundation Context gathered: 2026-05-21