6fed5ba531
Research, pattern mapping, and verification complete. Walking Skeleton mode active (MVP Phase 1). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.4 KiB
5.4 KiB
Walking Skeleton — DocuVault
Phase: 1 Generated: 2026-05-21
Capability Proven End-to-End
A real document upload (PDF or TXT) sent to POST /api/documents/upload running inside docker compose up persists its metadata in PostgreSQL, stores its bytes in MinIO under the docuvault bucket using a {user_id|null-marker}/{document_id}/{uuid4()}{ext} object key, enqueues a Celery task on Redis that performs text extraction and AI classification, and returns the document JSON — with GET /health simultaneously reporting postgres: ok and minio: ok.
Architectural Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Backend framework | FastAPI 0.111+ on Python 3.12 (existing) | Already in use; async ecosystem matches PostgreSQL + MinIO + Celery |
| ORM / async DB driver | SQLAlchemy 2.0 async + psycopg[binary] v3 (postgresql+psycopg:// URL prefix) |
Single driver works for both Alembic (sync) and FastAPI (async); RESEARCH.md Pattern 1 |
| Migrations | Alembic async template (alembic init -t async); two DSNs — DATABASE_URL (app, restricted) + DATABASE_MIGRATE_URL (DDL); expire_on_commit=False on async_sessionmaker |
D-13, D-14; RESEARCH.md Pattern 2; Pitfall 1 |
| Object storage | MinIO official Python SDK wrapped in asyncio.to_thread(); single bucket docuvault; UUID-based keys |
D-06; RESEARCH.md Pattern 3; STORE-02 |
| Background queue | Celery 5.4+ with Redis broker + result backend; sync def tasks (no async def for tasks) |
D-08, D-10; RESEARCH.md Pattern 5; replaces FastAPI BackgroundTasks per STORE-08 |
| Storage abstraction | StorageBackend ABC + get_storage_backend() factory in backend/storage/ mirroring backend/ai/base.py + backend/ai/__init__.py |
Established project pattern; CLOUD-07 forward-compatible |
| Secrets / config | Pydantic Settings reading .env in dev; env_file: /etc/docuvault/env in prod (chmod 600); .env.example committed with safe placeholders |
D-11, D-12 |
| Service ordering | Docker Compose healthcheck + depends_on: condition: service_healthy for postgres / minio / redis; mc ready local for MinIO; redis-cli -a $REDIS_PASSWORD ping for Redis |
RESEARCH.md Pattern 6 + Pitfall 5 |
| Directory layout | backend/db/ (models, session), backend/deps/ (FastAPI deps), backend/storage/ (object backend ABC + impls), backend/tasks/ (Celery tasks), backend/migrations/ (Alembic) |
RESEARCH.md "Recommended Project Structure" |
| Deployment | Local docker compose up only in Phase 1; production deploy target deferred to a later phase |
Phase 1 success criterion is single-command local boot |
Stack Touched in Phase 1
- Project scaffold — Python 3.12 backend container, Dockerfile unchanged
- Routing —
GET /health(extended) +POST /api/documents/upload(rewired) +GET /api/documents(rewired) - Database — Alembic migration creates full v1 schema (10 tables incl.
groupsstub per D-02); upload writes adocumentsrow, list reads documents - Object storage — MinIO bucket auto-created on app startup; upload writes object, key matches
{user_id|null-marker}/{document_id}/{uuid4()}{ext} - Background worker — Celery worker container running; upload enqueues
tasks.document_tasks.extract_and_classify; result observable in worker logs - Deployment — single
docker compose upboots PostgreSQL + MinIO + Redis + backend + celery-worker; all health checks green
Out of Scope (Deferred to Later Slices)
These are intentionally NOT in Phase 1 — later phases must not re-litigate this minimalism.
- Users, authentication, registration, JWT, refresh tokens, TOTP, password reset — Phase 2 (
documents.user_idis nullable in Phase 1 per D-03) - Multi-user isolation enforcement (per-row ownership checks, presigned URL flow) — Phase 3
- Per-user 100 MB quota enforcement (
UPDATE quotas ... RETURNING used_bytes) — Phase 3 (quotastable exists per D-01 but has no rows and no constraint code path) - Frontend changes — none; the Vue 3 SPA must continue to call the existing endpoint shapes
- Folders / sharing / search / PDF preview — Phase 4
- Cloud storage backends (OneDrive, Google Drive, Nextcloud, WebDAV) — Phase 5 (
cloud_connectionstable exists per D-01 but has no implementation) - Admin endpoints, audit log writes, CSRF, CSP headers, rate limiting — Phase 2
- Existing flat-file data migration script — explicitly skipped (D-04:
data/directory is deleted; test data only) - Production deployment target / CD pipeline — deferred
Subsequent Slice Plan
Each later phase adds one vertical slice on top of this skeleton without altering its architectural decisions:
- Phase 2: Users register / log in / enroll TOTP / reset password / sign out all — adds
NOT NULLconstraint todocuments.user_id, populatesusers+refresh_tokens, adds JWT middleware + CSRF + rate limiting + security headers, plus admin user management - Phase 3: All documents owned by a user; presigned URL flow for downloads; atomic 100 MB quota enforced via the
quotastable seeded in Phase 1; admin-assigned AI provider/model used for classification - Phase 4: Folder CRUD + document move; share by handle; full-text search via PostgreSQL
tsvector; in-browser PDF preview proxied through the app; admin audit log viewer - Phase 5:
StorageBackendABC gains OneDrive / Google Drive / Nextcloud / WebDAV implementations; HKDF per-user credential encryption populatescloud_connections.credentials_enc