Files
kite/.planning/phases/01-infrastructure-foundation/01-CONTEXT.md
T
2026-05-21 21:31:06 +02:00

6.9 KiB

Phase 1: Infrastructure Foundation - Context

Gathered: 2026-05-21 Status: Ready for planning

## Phase Boundary

Wire PostgreSQL, MinIO, Redis, and Celery into Docker Compose; create a complete Alembic-managed schema; rewrite the document storage service layer to use PostgreSQL + MinIO; delete legacy flat-file data. The existing document upload and AI classification workflow must continue to work end-to-end with the new infrastructure. No user-facing behavior change, but the internal storage layer is fully replaced.

## Implementation Decisions

Schema Scope

  • D-01: Phase 1 initial Alembic migration creates the full v1 skeleton — all tables: users, refresh_tokens, quotas, documents, topics, folders, shares, audit_log, cloud_connections. Subsequent phases add data and constraints, not new tables.
  • D-02: groups table stub included in Phase 1 migration (v2 feature, but PROJECT.md explicitly notes "groups table will be seeded in schema"). Empty table, correct columns and FKs.
  • D-03: documents.user_id is nullable in Phase 1 (no auth system yet). Phase 2 migration adds the NOT NULL constraint after the user/auth system is live.
  • D-04: Existing data/ directory contents (flat-file JSON metadata + uploaded files) are deleted in Phase 1. They are test data only — no migration script needed. Phase 3's migration scope is removed.

App Wiring

  • D-05: Phase 1 switches the storage service layer to PostgreSQL + MinIO. backend/services/storage.py is rewritten to use async SQLAlchemy + MinIO SDK. The app does not continue using the filesystem after Phase 1.
  • D-06: Single MinIO bucket named docuvault. Object keys follow {user_id}/{document_id}/{uuid4()}{ext} (STORE-02). Human-readable filenames stored in the documents.filename DB column only — never in the MinIO key.
  • D-07: backend/main.py /health endpoint extended to check PostgreSQL + MinIO connectivity (not just {"status": "ok"}). Health checks gate docker compose up readiness.

Background Worker

  • D-08: Background task queue: Celery + Redis (STORE-08). FastAPI BackgroundTasks replaced.
  • D-09: Redis service added to docker-compose.yml in Phase 1 (alongside PostgreSQL and MinIO). Redis doubles as the rate-limiting store for Phase 2 auth endpoints — no second Redis needed later.
  • D-10: A celery-worker service is added to docker-compose.yml. Celery broker and result backend both point to the same Redis instance via REDIS_URL.

Env / Secrets Strategy

  • D-11: .env gitignored + .env.example committed. docker-compose.yml reads vars via ${VAR_NAME}. .env.example has safe placeholder values and comments explaining each variable.
  • D-12: Production secrets stored outside the project directory at /etc/docuvault/env (chmod 600, owned by the service user, not root). docker-compose.yml references it via env_file:. Documented in deployment notes.
  • D-13: Two PostgreSQL DSNs introduced:
    • DATABASE_URL — restricted app user docuvault_app (SELECT / INSERT / UPDATE / DELETE only; no DDL)
    • DATABASE_MIGRATE_URL — migration user docuvault_migrate (DDL privileges; used only by Alembic)
  • D-14: PostgreSQL init script in docker/postgres/initdb.d/ provisions both users on first container start. The app never connects as the PostgreSQL superuser.
  • D-15: MinIO vars: MINIO_ENDPOINT, MINIO_ROOT_USER, MINIO_ROOT_PASSWORD (init only), MINIO_BUCKET (value: docuvault), MINIO_ACCESS_KEY, MINIO_SECRET_KEY (separate app-level access key pair with minimal bucket permissions).
  • D-16: Additional vars in Phase 1 .env.example: REDIS_URL, SECRET_KEY (documented now for Phase 2 JWT + HKDF use; app does not read it in Phase 1).

<canonical_refs>

Canonical References

Downstream agents MUST read these before planning or implementing.

Requirements

  • .planning/REQUIREMENTS.md — STORE-01 (storage migration), STORE-02 (MinIO key schema), STORE-07 (stateless backend), STORE-08 (BackgroundTasks replacement)
  • .planning/ROADMAP.md — Phase 1 goal, success criteria (especially criterion #4: MinIO key schema enforced in model layer)

Project Decisions

  • .planning/PROJECT.md — Key Decisions table (PostgreSQL + MinIO rationale, HKDF key derivation, atomic quota UPDATE pattern, groups table seeding)
  • .planning/STATE.md — Open Questions (Celery+Redis now resolved as Celery+Redis; PyOTP valid_window note for Phase 2)

</canonical_refs>

<code_context>

Existing Code Insights

Reusable Assets

  • backend/ai/base.py + backend/ai/__init__.py — ABC + factory pattern; the storage/ module for MinIO/cloud backends should mirror this exactly (StorageBackend ABC + get_storage_backend() factory)
  • backend/config.py — Pydantic Settings class; extend with new env vars (DATABASE_URL, DATABASE_MIGRATE_URL, MINIO_*, REDIS_URL, SECRET_KEY)
  • backend/main.py /health — extend to probe PostgreSQL + MinIO rather than replace

Established Patterns

  • Provider pattern (ABC + factory in ai/) — storage module mirrors this
  • Service layer (services/extractor.py, services/classifier.py) — pure Python modules, no FastAPI coupling; new storage service follows the same boundary
  • Pinia-as-facade (frontend) — no changes needed in Phase 1; API contract preserved

Integration Points

  • docker-compose.yml — add postgres, minio, redis, celery-worker services; add health checks and depends_on ordering
  • backend/requirements.txt — add: sqlalchemy[asyncio]>=2.0, psycopg[binary,pool]>=3, alembic, minio, celery[redis]
  • backend/services/storage.pyreplace entirely with async SQLAlchemy + MinIO SDK implementation
  • backend/api/documents.py — update to use new async storage service; interface should stay stable so frontend is unaffected
  • backend/main.py — add SQLAlchemy async engine lifespan (startup/shutdown); extend /health

Constraints

  • All CORS origins are currently ["*"] — leave as-is for Phase 1; Phase 2 locks this down with auth
  • No linter/formatter config in repo — don't introduce one in Phase 1 (out of scope)
  • filelock dependency can be removed once services/storage.py is replaced

</code_context>

## Specific Ideas
  • MinIO bucket name: docuvault (single bucket, prefix-based isolation)
  • PostgreSQL users: docuvault_app (runtime, restricted) + docuvault_migrate (Alembic migrations, DDL)
  • Init script path: docker/postgres/initdb.d/01-init-users.sql
  • Production env file location: /etc/docuvault/env (outside project dir, chmod 600)
  • Redis URL format: redis://:${REDIS_PASSWORD}@redis:6379/0 (password-protected even in dev)
## Deferred Ideas

None — discussion stayed within phase scope.


Phase: 1-Infrastructure Foundation Context gathered: 2026-05-21