docs(01): capture phase context
This commit is contained in:
@@ -0,0 +1,107 @@
|
|||||||
|
# Phase 1: Infrastructure Foundation - Context
|
||||||
|
|
||||||
|
**Gathered:** 2026-05-21
|
||||||
|
**Status:** Ready for planning
|
||||||
|
|
||||||
|
<domain>
|
||||||
|
## Phase Boundary
|
||||||
|
|
||||||
|
Wire PostgreSQL, MinIO, Redis, and Celery into Docker Compose; create a complete Alembic-managed schema; rewrite the document storage service layer to use PostgreSQL + MinIO; delete legacy flat-file data. The existing document upload and AI classification workflow must continue to work end-to-end with the new infrastructure. No user-facing behavior change, but the internal storage layer is fully replaced.
|
||||||
|
|
||||||
|
</domain>
|
||||||
|
|
||||||
|
<decisions>
|
||||||
|
## Implementation Decisions
|
||||||
|
|
||||||
|
### Schema Scope
|
||||||
|
- **D-01:** Phase 1 initial Alembic migration creates the **full v1 skeleton** — all tables: `users`, `refresh_tokens`, `quotas`, `documents`, `topics`, `folders`, `shares`, `audit_log`, `cloud_connections`. Subsequent phases add data and constraints, not new tables.
|
||||||
|
- **D-02:** `groups` table stub included in Phase 1 migration (v2 feature, but PROJECT.md explicitly notes "groups table will be seeded in schema"). Empty table, correct columns and FKs.
|
||||||
|
- **D-03:** `documents.user_id` is **nullable** in Phase 1 (no auth system yet). Phase 2 migration adds the `NOT NULL` constraint after the user/auth system is live.
|
||||||
|
- **D-04:** Existing `data/` directory contents (flat-file JSON metadata + uploaded files) are **deleted** in Phase 1. They are test data only — no migration script needed. Phase 3's migration scope is removed.
|
||||||
|
|
||||||
|
### App Wiring
|
||||||
|
- **D-05:** Phase 1 **switches the storage service layer** to PostgreSQL + MinIO. `backend/services/storage.py` is rewritten to use async SQLAlchemy + MinIO SDK. The app does not continue using the filesystem after Phase 1.
|
||||||
|
- **D-06:** Single MinIO bucket named **`docuvault`**. Object keys follow `{user_id}/{document_id}/{uuid4()}{ext}` (STORE-02). Human-readable filenames stored in the `documents.filename` DB column only — never in the MinIO key.
|
||||||
|
- **D-07:** `backend/main.py` `/health` endpoint extended to check PostgreSQL + MinIO connectivity (not just `{"status": "ok"}`). Health checks gate `docker compose up` readiness.
|
||||||
|
|
||||||
|
### Background Worker
|
||||||
|
- **D-08:** Background task queue: **Celery + Redis** (STORE-08). FastAPI `BackgroundTasks` replaced.
|
||||||
|
- **D-09:** Redis service added to `docker-compose.yml` in **Phase 1** (alongside PostgreSQL and MinIO). Redis doubles as the rate-limiting store for Phase 2 auth endpoints — no second Redis needed later.
|
||||||
|
- **D-10:** A `celery-worker` service is added to `docker-compose.yml`. Celery broker and result backend both point to the same Redis instance via `REDIS_URL`.
|
||||||
|
|
||||||
|
### Env / Secrets Strategy
|
||||||
|
- **D-11:** `.env` gitignored + `.env.example` committed. `docker-compose.yml` reads vars via `${VAR_NAME}`. `.env.example` has safe placeholder values and comments explaining each variable.
|
||||||
|
- **D-12:** Production secrets stored **outside the project directory** at `/etc/docuvault/env` (`chmod 600`, owned by the service user, not root). `docker-compose.yml` references it via `env_file:`. Documented in deployment notes.
|
||||||
|
- **D-13:** **Two PostgreSQL DSNs** introduced:
|
||||||
|
- `DATABASE_URL` — restricted app user `docuvault_app` (SELECT / INSERT / UPDATE / DELETE only; no DDL)
|
||||||
|
- `DATABASE_MIGRATE_URL` — migration user `docuvault_migrate` (DDL privileges; used only by Alembic)
|
||||||
|
- **D-14:** PostgreSQL init script in `docker/postgres/initdb.d/` provisions both users on first container start. The app never connects as the PostgreSQL superuser.
|
||||||
|
- **D-15:** MinIO vars: `MINIO_ENDPOINT`, `MINIO_ROOT_USER`, `MINIO_ROOT_PASSWORD` (init only), `MINIO_BUCKET` (value: `docuvault`), `MINIO_ACCESS_KEY`, `MINIO_SECRET_KEY` (separate app-level access key pair with minimal bucket permissions).
|
||||||
|
- **D-16:** Additional vars in Phase 1 `.env.example`: `REDIS_URL`, `SECRET_KEY` (documented now for Phase 2 JWT + HKDF use; app does not read it in Phase 1).
|
||||||
|
|
||||||
|
</decisions>
|
||||||
|
|
||||||
|
<canonical_refs>
|
||||||
|
## Canonical References
|
||||||
|
|
||||||
|
**Downstream agents MUST read these before planning or implementing.**
|
||||||
|
|
||||||
|
### Requirements
|
||||||
|
- `.planning/REQUIREMENTS.md` — STORE-01 (storage migration), STORE-02 (MinIO key schema), STORE-07 (stateless backend), STORE-08 (BackgroundTasks replacement)
|
||||||
|
- `.planning/ROADMAP.md` — Phase 1 goal, success criteria (especially criterion #4: MinIO key schema enforced in model layer)
|
||||||
|
|
||||||
|
### Project Decisions
|
||||||
|
- `.planning/PROJECT.md` — Key Decisions table (PostgreSQL + MinIO rationale, HKDF key derivation, atomic quota UPDATE pattern, groups table seeding)
|
||||||
|
- `.planning/STATE.md` — Open Questions (Celery+Redis now resolved as Celery+Redis; PyOTP valid_window note for Phase 2)
|
||||||
|
|
||||||
|
</canonical_refs>
|
||||||
|
|
||||||
|
<code_context>
|
||||||
|
## Existing Code Insights
|
||||||
|
|
||||||
|
### Reusable Assets
|
||||||
|
- `backend/ai/base.py` + `backend/ai/__init__.py` — ABC + factory pattern; the `storage/` module for MinIO/cloud backends should mirror this exactly (StorageBackend ABC + get_storage_backend() factory)
|
||||||
|
- `backend/config.py` — Pydantic Settings class; extend with new env vars (DATABASE_URL, DATABASE_MIGRATE_URL, MINIO_*, REDIS_URL, SECRET_KEY)
|
||||||
|
- `backend/main.py` `/health` — extend to probe PostgreSQL + MinIO rather than replace
|
||||||
|
|
||||||
|
### Established Patterns
|
||||||
|
- **Provider pattern** (ABC + factory in `ai/`) — storage module mirrors this
|
||||||
|
- **Service layer** (`services/extractor.py`, `services/classifier.py`) — pure Python modules, no FastAPI coupling; new storage service follows the same boundary
|
||||||
|
- **Pinia-as-facade** (frontend) — no changes needed in Phase 1; API contract preserved
|
||||||
|
|
||||||
|
### Integration Points
|
||||||
|
- `docker-compose.yml` — add `postgres`, `minio`, `redis`, `celery-worker` services; add health checks and `depends_on` ordering
|
||||||
|
- `backend/requirements.txt` — add: `sqlalchemy[asyncio]>=2.0`, `psycopg[binary,pool]>=3`, `alembic`, `minio`, `celery[redis]`
|
||||||
|
- `backend/services/storage.py` — **replace entirely** with async SQLAlchemy + MinIO SDK implementation
|
||||||
|
- `backend/api/documents.py` — update to use new async storage service; interface should stay stable so frontend is unaffected
|
||||||
|
- `backend/main.py` — add SQLAlchemy async engine lifespan (startup/shutdown); extend `/health`
|
||||||
|
|
||||||
|
### Constraints
|
||||||
|
- All CORS origins are currently `["*"]` — leave as-is for Phase 1; Phase 2 locks this down with auth
|
||||||
|
- No linter/formatter config in repo — don't introduce one in Phase 1 (out of scope)
|
||||||
|
- `filelock` dependency can be removed once `services/storage.py` is replaced
|
||||||
|
|
||||||
|
</code_context>
|
||||||
|
|
||||||
|
<specifics>
|
||||||
|
## Specific Ideas
|
||||||
|
|
||||||
|
- MinIO bucket name: `docuvault` (single bucket, prefix-based isolation)
|
||||||
|
- PostgreSQL users: `docuvault_app` (runtime, restricted) + `docuvault_migrate` (Alembic migrations, DDL)
|
||||||
|
- Init script path: `docker/postgres/initdb.d/01-init-users.sql`
|
||||||
|
- Production env file location: `/etc/docuvault/env` (outside project dir, `chmod 600`)
|
||||||
|
- Redis URL format: `redis://:${REDIS_PASSWORD}@redis:6379/0` (password-protected even in dev)
|
||||||
|
|
||||||
|
</specifics>
|
||||||
|
|
||||||
|
<deferred>
|
||||||
|
## Deferred Ideas
|
||||||
|
|
||||||
|
None — discussion stayed within phase scope.
|
||||||
|
|
||||||
|
</deferred>
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Phase: 1-Infrastructure Foundation*
|
||||||
|
*Context gathered: 2026-05-21*
|
||||||
@@ -0,0 +1,153 @@
|
|||||||
|
# Phase 1: Infrastructure Foundation - Discussion Log
|
||||||
|
|
||||||
|
> **Audit trail only.** Do not use as input to planning, research, or execution agents.
|
||||||
|
> Decisions are captured in CONTEXT.md — this log preserves the alternatives considered.
|
||||||
|
|
||||||
|
**Date:** 2026-05-21
|
||||||
|
**Phase:** 1-Infrastructure Foundation
|
||||||
|
**Areas discussed:** Schema scope, App wiring, Background worker, Env / secrets strategy
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Schema Scope
|
||||||
|
|
||||||
|
| Option | Description | Selected |
|
||||||
|
|--------|-------------|----------|
|
||||||
|
| Full v1 skeleton | All tables upfront: users, refresh_tokens, quotas, documents, topics, folders, shares, audit_log, cloud_connections | ✓ |
|
||||||
|
| Phase 1 minimum | Only documents + topics tables; users and auth added in Phase 2 | |
|
||||||
|
| You decide | Leave scope to planner | |
|
||||||
|
|
||||||
|
**User's choice:** Full v1 skeleton
|
||||||
|
**Notes:** —
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
| Option | Description | Selected |
|
||||||
|
|--------|-------------|----------|
|
||||||
|
| Yes, seed the groups table | Empty table, correct columns and FKs; avoids schema change when v2 Group admin lands | ✓ |
|
||||||
|
| No, v1 tables only | Keep Phase 1 strictly to v1 requirements | |
|
||||||
|
|
||||||
|
**User's choice:** Yes, seed the groups table
|
||||||
|
**Notes:** PROJECT.md explicitly notes groups table will be seeded in schema.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
| Option | Description | Selected |
|
||||||
|
|--------|-------------|----------|
|
||||||
|
| Leave existing data on filesystem; schema empty at Phase 1 end | Phase 3 runs migration script | |
|
||||||
|
| Seed a system / placeholder user | Sentinel user_id for Phase 1 uploads | |
|
||||||
|
|
||||||
|
**User's choice:** (Free text) "The current documents and files are just test data and can be deleted. Please switch uploads to PostgreSQL and MinIO."
|
||||||
|
**Notes:** Existing flat-file data is expendable test data. This changed the Phase 1 scope significantly: the storage service layer is rewritten in Phase 1, not Phase 3. Phase 3's data migration scope is removed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## App Wiring
|
||||||
|
|
||||||
|
| Option | Description | Selected |
|
||||||
|
|--------|-------------|----------|
|
||||||
|
| Filesystem stays active; new infra sits ready | Phase 1 only wires infra; Phase 3 switches service layer | |
|
||||||
|
| Phase 1 switches uploads to PostgreSQL + MinIO | services/storage.py rewritten in Phase 1 | ✓ |
|
||||||
|
|
||||||
|
**User's choice:** Phase 1 switches uploads to PostgreSQL + MinIO
|
||||||
|
**Notes:** User clarified existing data is test data and can be deleted.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
| Option | Description | Selected |
|
||||||
|
|--------|-------------|----------|
|
||||||
|
| Seed a default/admin user in Phase 1 migration | All Phase 1 uploads attributed to placeholder user | |
|
||||||
|
| Make user_id nullable for Phase 1, enforce NOT NULL in Phase 2 | Avoids phantom user; Phase 2 adds constraint | ✓ |
|
||||||
|
|
||||||
|
**User's choice:** user_id nullable in Phase 1
|
||||||
|
**Notes:** —
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
| Option | Description | Selected |
|
||||||
|
|--------|-------------|----------|
|
||||||
|
| Leave data/ directory in place | Stop writing to it; clean up manually or in Phase 3 | |
|
||||||
|
| Delete data/ contents in Phase 1 | Clean break; no ambiguity about active storage | ✓ |
|
||||||
|
|
||||||
|
**User's choice:** Delete data/ contents in Phase 1
|
||||||
|
**Notes:** Old data is test data only.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
| Option | Description | Selected |
|
||||||
|
|--------|-------------|----------|
|
||||||
|
| Single bucket 'docuvault' with user prefix paths | Standard S3 pattern; matches STORE-02 key schema | ✓ |
|
||||||
|
| Per-user buckets | Stronger isolation but bucket count issues and operational complexity | |
|
||||||
|
|
||||||
|
**User's choice:** Single bucket 'docuvault'
|
||||||
|
**Notes:** —
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Background Worker
|
||||||
|
|
||||||
|
| Option | Description | Selected |
|
||||||
|
|--------|-------------|----------|
|
||||||
|
| pgqueuer | Postgres-native; no Redis; simpler Compose | |
|
||||||
|
| Celery + Redis | Battle-tested; adds Redis service | ✓ |
|
||||||
|
|
||||||
|
**User's choice:** Celery + Redis
|
||||||
|
**Notes:** —
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
| Option | Description | Selected |
|
||||||
|
|--------|-------------|----------|
|
||||||
|
| Phase 1 — add it now with the other infra | All services wired together in one shot | ✓ |
|
||||||
|
| Phase 3 — add when horizontal scaling needed | STORE-08 is a Phase 3 requirement | |
|
||||||
|
|
||||||
|
**User's choice:** Phase 1 — add Redis + Celery now
|
||||||
|
**Notes:** Redis also serves as the rate-limiting store for Phase 2 auth endpoints.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Env / Secrets Strategy
|
||||||
|
|
||||||
|
**User asked for best practice guidance** before answering — wanted to understand production VPS implications (root server / VPS deployment, database admin password handling). Claude explained:
|
||||||
|
- App should never use PostgreSQL superuser; restricted users per principle of least privilege
|
||||||
|
- Secrets stored outside project directory at `/etc/docuvault/env`; `chmod 600`, service user owned
|
||||||
|
- Two DB DSNs: runtime (restricted) + migration (DDL privileges)
|
||||||
|
|
||||||
|
| Option | Description | Selected |
|
||||||
|
|--------|-------------|----------|
|
||||||
|
| .env gitignored + .env.example committed | Docker Compose native; self-hosters copy .env.example → .env | ✓ |
|
||||||
|
| Hardcoded dev defaults in docker-compose.yml | Simpler but secrets committed to git | |
|
||||||
|
| docker-compose.override.yml for secrets | Flexible but extra indirection layer | |
|
||||||
|
|
||||||
|
**User's choice:** .env gitignored + .env.example committed; production at `/etc/docuvault/env`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
| Env var group | Selected |
|
||||||
|
|---|---|
|
||||||
|
| DATABASE_URL + DATABASE_MIGRATE_URL | ✓ |
|
||||||
|
| MINIO_ENDPOINT / MINIO_ROOT_USER / MINIO_ROOT_PASSWORD / MINIO_BUCKET | ✓ |
|
||||||
|
| REDIS_URL | ✓ |
|
||||||
|
| SECRET_KEY | ✓ |
|
||||||
|
|
||||||
|
**Notes:** All four groups introduced in Phase 1. SECRET_KEY documented now for Phase 2+ use (JWT + HKDF) even though Phase 1 doesn't read it.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
| Option | Description | Selected |
|
||||||
|
|--------|-------------|----------|
|
||||||
|
| Yes — provision restricted users in Phase 1 | init script creates docuvault_app + docuvault_migrate | ✓ |
|
||||||
|
| No — use superuser for now, split in Phase 2 | Simpler but not least-privilege | |
|
||||||
|
|
||||||
|
**User's choice:** Yes — provision restricted users in Phase 1
|
||||||
|
**Notes:** —
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Claude's Discretion
|
||||||
|
|
||||||
|
None — user made explicit choices for all areas.
|
||||||
|
|
||||||
|
## Deferred Ideas
|
||||||
|
|
||||||
|
None — discussion stayed within phase scope.
|
||||||
Reference in New Issue
Block a user