Files
kite/.planning/phases/01-infrastructure-foundation/SKELETON.md
T
curo1305 6fed5ba531 docs(01): create phase 1 plan — 5 plans in 4 waves
Research, pattern mapping, and verification complete.
Walking Skeleton mode active (MVP Phase 1).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 08:49:36 +02:00

5.4 KiB

Walking Skeleton — DocuVault

Phase: 1 Generated: 2026-05-21

Capability Proven End-to-End

A real document upload (PDF or TXT) sent to POST /api/documents/upload running inside docker compose up persists its metadata in PostgreSQL, stores its bytes in MinIO under the docuvault bucket using a {user_id|null-marker}/{document_id}/{uuid4()}{ext} object key, enqueues a Celery task on Redis that performs text extraction and AI classification, and returns the document JSON — with GET /health simultaneously reporting postgres: ok and minio: ok.

Architectural Decisions

Decision Choice Rationale
Backend framework FastAPI 0.111+ on Python 3.12 (existing) Already in use; async ecosystem matches PostgreSQL + MinIO + Celery
ORM / async DB driver SQLAlchemy 2.0 async + psycopg[binary] v3 (postgresql+psycopg:// URL prefix) Single driver works for both Alembic (sync) and FastAPI (async); RESEARCH.md Pattern 1
Migrations Alembic async template (alembic init -t async); two DSNs — DATABASE_URL (app, restricted) + DATABASE_MIGRATE_URL (DDL); expire_on_commit=False on async_sessionmaker D-13, D-14; RESEARCH.md Pattern 2; Pitfall 1
Object storage MinIO official Python SDK wrapped in asyncio.to_thread(); single bucket docuvault; UUID-based keys D-06; RESEARCH.md Pattern 3; STORE-02
Background queue Celery 5.4+ with Redis broker + result backend; sync def tasks (no async def for tasks) D-08, D-10; RESEARCH.md Pattern 5; replaces FastAPI BackgroundTasks per STORE-08
Storage abstraction StorageBackend ABC + get_storage_backend() factory in backend/storage/ mirroring backend/ai/base.py + backend/ai/__init__.py Established project pattern; CLOUD-07 forward-compatible
Secrets / config Pydantic Settings reading .env in dev; env_file: /etc/docuvault/env in prod (chmod 600); .env.example committed with safe placeholders D-11, D-12
Service ordering Docker Compose healthcheck + depends_on: condition: service_healthy for postgres / minio / redis; mc ready local for MinIO; redis-cli -a $REDIS_PASSWORD ping for Redis RESEARCH.md Pattern 6 + Pitfall 5
Directory layout backend/db/ (models, session), backend/deps/ (FastAPI deps), backend/storage/ (object backend ABC + impls), backend/tasks/ (Celery tasks), backend/migrations/ (Alembic) RESEARCH.md "Recommended Project Structure"
Deployment Local docker compose up only in Phase 1; production deploy target deferred to a later phase Phase 1 success criterion is single-command local boot

Stack Touched in Phase 1

  • Project scaffold — Python 3.12 backend container, Dockerfile unchanged
  • Routing — GET /health (extended) + POST /api/documents/upload (rewired) + GET /api/documents (rewired)
  • Database — Alembic migration creates full v1 schema (10 tables incl. groups stub per D-02); upload writes a documents row, list reads documents
  • Object storage — MinIO bucket auto-created on app startup; upload writes object, key matches {user_id|null-marker}/{document_id}/{uuid4()}{ext}
  • Background worker — Celery worker container running; upload enqueues tasks.document_tasks.extract_and_classify; result observable in worker logs
  • Deployment — single docker compose up boots PostgreSQL + MinIO + Redis + backend + celery-worker; all health checks green

Out of Scope (Deferred to Later Slices)

These are intentionally NOT in Phase 1 — later phases must not re-litigate this minimalism.

  • Users, authentication, registration, JWT, refresh tokens, TOTP, password reset — Phase 2 (documents.user_id is nullable in Phase 1 per D-03)
  • Multi-user isolation enforcement (per-row ownership checks, presigned URL flow) — Phase 3
  • Per-user 100 MB quota enforcement (UPDATE quotas ... RETURNING used_bytes) — Phase 3 (quotas table exists per D-01 but has no rows and no constraint code path)
  • Frontend changes — none; the Vue 3 SPA must continue to call the existing endpoint shapes
  • Folders / sharing / search / PDF preview — Phase 4
  • Cloud storage backends (OneDrive, Google Drive, Nextcloud, WebDAV) — Phase 5 (cloud_connections table exists per D-01 but has no implementation)
  • Admin endpoints, audit log writes, CSRF, CSP headers, rate limiting — Phase 2
  • Existing flat-file data migration script — explicitly skipped (D-04: data/ directory is deleted; test data only)
  • Production deployment target / CD pipeline — deferred

Subsequent Slice Plan

Each later phase adds one vertical slice on top of this skeleton without altering its architectural decisions:

  • Phase 2: Users register / log in / enroll TOTP / reset password / sign out all — adds NOT NULL constraint to documents.user_id, populates users + refresh_tokens, adds JWT middleware + CSRF + rate limiting + security headers, plus admin user management
  • Phase 3: All documents owned by a user; presigned URL flow for downloads; atomic 100 MB quota enforced via the quotas table seeded in Phase 1; admin-assigned AI provider/model used for classification
  • Phase 4: Folder CRUD + document move; share by handle; full-text search via PostgreSQL tsvector; in-browser PDF preview proxied through the app; admin audit log viewer
  • Phase 5: StorageBackend ABC gains OneDrive / Google Drive / Nextcloud / WebDAV implementations; HKDF per-user credential encryption populates cloud_connections.credentials_enc