# Walking Skeleton — DocuVault **Phase:** 1 **Generated:** 2026-05-21 ## Capability Proven End-to-End A real document upload (PDF or TXT) sent to `POST /api/documents/upload` running inside `docker compose up` persists its metadata in PostgreSQL, stores its bytes in MinIO under the `docuvault` bucket using a `{user_id|null-marker}/{document_id}/{uuid4()}{ext}` object key, enqueues a Celery task on Redis that performs text extraction and AI classification, and returns the document JSON — with `GET /health` simultaneously reporting `postgres: ok` and `minio: ok`. ## Architectural Decisions | Decision | Choice | Rationale | |---|---|---| | Backend framework | FastAPI 0.111+ on Python 3.12 (existing) | Already in use; async ecosystem matches PostgreSQL + MinIO + Celery | | ORM / async DB driver | SQLAlchemy 2.0 async + `psycopg[binary]` v3 (`postgresql+psycopg://` URL prefix) | Single driver works for both Alembic (sync) and FastAPI (async); RESEARCH.md Pattern 1 | | Migrations | Alembic async template (`alembic init -t async`); two DSNs — `DATABASE_URL` (app, restricted) + `DATABASE_MIGRATE_URL` (DDL); `expire_on_commit=False` on `async_sessionmaker` | D-13, D-14; RESEARCH.md Pattern 2; Pitfall 1 | | Object storage | MinIO official Python SDK wrapped in `asyncio.to_thread()`; single bucket `docuvault`; UUID-based keys | D-06; RESEARCH.md Pattern 3; STORE-02 | | Background queue | Celery 5.4+ with Redis broker + result backend; sync `def` tasks (no `async def` for tasks) | D-08, D-10; RESEARCH.md Pattern 5; replaces FastAPI BackgroundTasks per STORE-08 | | Storage abstraction | `StorageBackend` ABC + `get_storage_backend()` factory in `backend/storage/` mirroring `backend/ai/base.py` + `backend/ai/__init__.py` | Established project pattern; CLOUD-07 forward-compatible | | Secrets / config | Pydantic Settings reading `.env` in dev; `env_file: /etc/docuvault/env` in prod (`chmod 600`); `.env.example` committed with safe placeholders | D-11, D-12 | | Service ordering | Docker Compose `healthcheck` + `depends_on: condition: service_healthy` for postgres / minio / redis; `mc ready local` for MinIO; `redis-cli -a $REDIS_PASSWORD ping` for Redis | RESEARCH.md Pattern 6 + Pitfall 5 | | Directory layout | `backend/db/` (models, session), `backend/deps/` (FastAPI deps), `backend/storage/` (object backend ABC + impls), `backend/tasks/` (Celery tasks), `backend/migrations/` (Alembic) | RESEARCH.md "Recommended Project Structure" | | Deployment | Local `docker compose up` only in Phase 1; production deploy target deferred to a later phase | Phase 1 success criterion is single-command local boot | ## Stack Touched in Phase 1 - [x] Project scaffold — Python 3.12 backend container, Dockerfile unchanged - [x] Routing — `GET /health` (extended) + `POST /api/documents/upload` (rewired) + `GET /api/documents` (rewired) - [x] Database — Alembic migration creates full v1 schema (10 tables incl. `groups` stub per D-02); upload writes a `documents` row, list reads documents - [x] Object storage — MinIO bucket auto-created on app startup; upload writes object, key matches `{user_id|null-marker}/{document_id}/{uuid4()}{ext}` - [x] Background worker — Celery worker container running; upload enqueues `tasks.document_tasks.extract_and_classify`; result observable in worker logs - [x] Deployment — single `docker compose up` boots PostgreSQL + MinIO + Redis + backend + celery-worker; all health checks green ## Out of Scope (Deferred to Later Slices) These are intentionally NOT in Phase 1 — later phases must not re-litigate this minimalism. - Users, authentication, registration, JWT, refresh tokens, TOTP, password reset — Phase 2 (`documents.user_id` is nullable in Phase 1 per D-03) - Multi-user isolation enforcement (per-row ownership checks, presigned URL flow) — Phase 3 - Per-user 100 MB quota enforcement (`UPDATE quotas ... RETURNING used_bytes`) — Phase 3 (`quotas` table exists per D-01 but has no rows and no constraint code path) - Frontend changes — none; the Vue 3 SPA must continue to call the existing endpoint shapes - Folders / sharing / search / PDF preview — Phase 4 - Cloud storage backends (OneDrive, Google Drive, Nextcloud, WebDAV) — Phase 5 (`cloud_connections` table exists per D-01 but has no implementation) - Admin endpoints, audit log writes, CSRF, CSP headers, rate limiting — Phase 2 - Existing flat-file data migration script — explicitly skipped (D-04: `data/` directory is deleted; test data only) - Production deployment target / CD pipeline — deferred ## Subsequent Slice Plan Each later phase adds one vertical slice on top of this skeleton without altering its architectural decisions: - **Phase 2:** Users register / log in / enroll TOTP / reset password / sign out all — adds `NOT NULL` constraint to `documents.user_id`, populates `users` + `refresh_tokens`, adds JWT middleware + CSRF + rate limiting + security headers, plus admin user management - **Phase 3:** All documents owned by a user; presigned URL flow for downloads; atomic 100 MB quota enforced via the `quotas` table seeded in Phase 1; admin-assigned AI provider/model used for classification - **Phase 4:** Folder CRUD + document move; share by handle; full-text search via PostgreSQL `tsvector`; in-browser PDF preview proxied through the app; admin audit log viewer - **Phase 5:** `StorageBackend` ABC gains OneDrive / Google Drive / Nextcloud / WebDAV implementations; HKDF per-user credential encryption populates `cloud_connections.credentials_enc`