Files
kite/.planning/phases/01-infrastructure-foundation/SKELETON.md
T
curo1305 6fed5ba531 docs(01): create phase 1 plan — 5 plans in 4 waves
Research, pattern mapping, and verification complete.
Walking Skeleton mode active (MVP Phase 1).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 08:49:36 +02:00

56 lines
5.4 KiB
Markdown

# Walking Skeleton — DocuVault
**Phase:** 1
**Generated:** 2026-05-21
## Capability Proven End-to-End
A real document upload (PDF or TXT) sent to `POST /api/documents/upload` running inside `docker compose up` persists its metadata in PostgreSQL, stores its bytes in MinIO under the `docuvault` bucket using a `{user_id|null-marker}/{document_id}/{uuid4()}{ext}` object key, enqueues a Celery task on Redis that performs text extraction and AI classification, and returns the document JSON — with `GET /health` simultaneously reporting `postgres: ok` and `minio: ok`.
## Architectural Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Backend framework | FastAPI 0.111+ on Python 3.12 (existing) | Already in use; async ecosystem matches PostgreSQL + MinIO + Celery |
| ORM / async DB driver | SQLAlchemy 2.0 async + `psycopg[binary]` v3 (`postgresql+psycopg://` URL prefix) | Single driver works for both Alembic (sync) and FastAPI (async); RESEARCH.md Pattern 1 |
| Migrations | Alembic async template (`alembic init -t async`); two DSNs — `DATABASE_URL` (app, restricted) + `DATABASE_MIGRATE_URL` (DDL); `expire_on_commit=False` on `async_sessionmaker` | D-13, D-14; RESEARCH.md Pattern 2; Pitfall 1 |
| Object storage | MinIO official Python SDK wrapped in `asyncio.to_thread()`; single bucket `docuvault`; UUID-based keys | D-06; RESEARCH.md Pattern 3; STORE-02 |
| Background queue | Celery 5.4+ with Redis broker + result backend; sync `def` tasks (no `async def` for tasks) | D-08, D-10; RESEARCH.md Pattern 5; replaces FastAPI BackgroundTasks per STORE-08 |
| Storage abstraction | `StorageBackend` ABC + `get_storage_backend()` factory in `backend/storage/` mirroring `backend/ai/base.py` + `backend/ai/__init__.py` | Established project pattern; CLOUD-07 forward-compatible |
| Secrets / config | Pydantic Settings reading `.env` in dev; `env_file: /etc/docuvault/env` in prod (`chmod 600`); `.env.example` committed with safe placeholders | D-11, D-12 |
| Service ordering | Docker Compose `healthcheck` + `depends_on: condition: service_healthy` for postgres / minio / redis; `mc ready local` for MinIO; `redis-cli -a $REDIS_PASSWORD ping` for Redis | RESEARCH.md Pattern 6 + Pitfall 5 |
| Directory layout | `backend/db/` (models, session), `backend/deps/` (FastAPI deps), `backend/storage/` (object backend ABC + impls), `backend/tasks/` (Celery tasks), `backend/migrations/` (Alembic) | RESEARCH.md "Recommended Project Structure" |
| Deployment | Local `docker compose up` only in Phase 1; production deploy target deferred to a later phase | Phase 1 success criterion is single-command local boot |
## Stack Touched in Phase 1
- [x] Project scaffold — Python 3.12 backend container, Dockerfile unchanged
- [x] Routing — `GET /health` (extended) + `POST /api/documents/upload` (rewired) + `GET /api/documents` (rewired)
- [x] Database — Alembic migration creates full v1 schema (10 tables incl. `groups` stub per D-02); upload writes a `documents` row, list reads documents
- [x] Object storage — MinIO bucket auto-created on app startup; upload writes object, key matches `{user_id|null-marker}/{document_id}/{uuid4()}{ext}`
- [x] Background worker — Celery worker container running; upload enqueues `tasks.document_tasks.extract_and_classify`; result observable in worker logs
- [x] Deployment — single `docker compose up` boots PostgreSQL + MinIO + Redis + backend + celery-worker; all health checks green
## Out of Scope (Deferred to Later Slices)
These are intentionally NOT in Phase 1 — later phases must not re-litigate this minimalism.
- Users, authentication, registration, JWT, refresh tokens, TOTP, password reset — Phase 2 (`documents.user_id` is nullable in Phase 1 per D-03)
- Multi-user isolation enforcement (per-row ownership checks, presigned URL flow) — Phase 3
- Per-user 100 MB quota enforcement (`UPDATE quotas ... RETURNING used_bytes`) — Phase 3 (`quotas` table exists per D-01 but has no rows and no constraint code path)
- Frontend changes — none; the Vue 3 SPA must continue to call the existing endpoint shapes
- Folders / sharing / search / PDF preview — Phase 4
- Cloud storage backends (OneDrive, Google Drive, Nextcloud, WebDAV) — Phase 5 (`cloud_connections` table exists per D-01 but has no implementation)
- Admin endpoints, audit log writes, CSRF, CSP headers, rate limiting — Phase 2
- Existing flat-file data migration script — explicitly skipped (D-04: `data/` directory is deleted; test data only)
- Production deployment target / CD pipeline — deferred
## Subsequent Slice Plan
Each later phase adds one vertical slice on top of this skeleton without altering its architectural decisions:
- **Phase 2:** Users register / log in / enroll TOTP / reset password / sign out all — adds `NOT NULL` constraint to `documents.user_id`, populates `users` + `refresh_tokens`, adds JWT middleware + CSRF + rate limiting + security headers, plus admin user management
- **Phase 3:** All documents owned by a user; presigned URL flow for downloads; atomic 100 MB quota enforced via the `quotas` table seeded in Phase 1; admin-assigned AI provider/model used for classification
- **Phase 4:** Folder CRUD + document move; share by handle; full-text search via PostgreSQL `tsvector`; in-browser PDF preview proxied through the app; admin audit log viewer
- **Phase 5:** `StorageBackend` ABC gains OneDrive / Google Drive / Nextcloud / WebDAV implementations; HKDF per-user credential encryption populates `cloud_connections.credentials_enc`