# DocuVault — Research Synthesis _Last updated: 2026-05-21_ ## Executive Summary DocuVault is a brownfield migration of a functional single-user document scanner into a privacy-first, multi-user SaaS platform. The existing system already handles document upload, text extraction, and AI-based topic classification via a well-designed provider abstraction. This milestone replaces the flat-file JSON + filesystem persistence layer with PostgreSQL + MinIO, adds full multi-user authentication (JWT with httpOnly cookies, TOTP 2FA, refresh token rotation), per-user quota enforcement, folder organization, document sharing, and pluggable cloud storage backends — following the same adapter pattern already used for AI providers. --- ## Confirmed Stack ### Use | Package | Version | Purpose | |---|---|---| | `pyjwt[crypto]` | ≥2.12.1 | JWT — current FastAPI docs recommendation; replaces python-jose | | `pwdlib[argon2]` | ≥0.2.0 | Password hashing — Argon2 is memory-hard (OWASP 2025) | | `pyotp` | ≥2.9.0 | TOTP 2FA — RFC 6238 reference | | `cryptography` (Fernet) | ≥44.0.0 | Credential encryption — AES-128-CBC + HMAC-SHA256 | | `sqlalchemy[asyncio]` | ≥2.0.36 | ORM — async-native; better brownfield fit than SQLModel | | `psycopg[asyncio,binary]` | ≥3.2.0 | PostgreSQL driver — single driver for async FastAPI + sync Alembic | | `alembic` | ≥1.14.0 | DB migrations | | `minio` | ≥7.2.0 | Object storage — presigned URL flow (FastAPI never proxies bytes) | | `msgraph-sdk` + `azure-identity` | ≥1.0.0 / ≥1.19.0 | OneDrive — official Microsoft SDK | | `google-api-python-client` + `google-auth-oauthlib` | ≥2.150.0 / ≥1.2.0 | Google Drive v3 | | `webdav4` | ≥0.9.8 | Nextcloud + generic WebDAV | ### Do NOT Use - `python-jose` — FastAPI dropped it; use PyJWT - `passlib[bcrypt]` for new hashes — maintenance mode; keep only for migrating existing hashes - `tiangolo/uvicorn-gunicorn-fastapi` Docker image — deprecated; use `python:3.12-slim` - `localStorage` for any auth token — XSS-accessible; httpOnly cookie for refresh, Pinia memory for access token - Single platform Fernet key for all users — HKDF per-user derivation required (catastrophic blast radius otherwise) - `SQLModel` for this migration — async story is thin; SQLAlchemy 2.0 async is better for brownfield --- ## Table-Stakes Features for v1 ### Confirmed (from PROJECT.md) - Email + password registration + JWT sessions with refresh tokens - TOTP 2FA + backup codes *(see gap below)* - Password reset via email - Per-user isolated storage (100 MB free tier) - Quota tracking, enforcement at upload, visible indicator - Folder CRUD, move documents, "Shared with me" folder - Share by handle, view-only default, immediate revoke - Cloud OAuth2 connect flow + credential encryption - Admin: user management, quota adjustment, AI provider assignment - Audit log (append-only, metadata only) + admin viewer - In-browser PDF preview ### Gaps — Items PROJECT.md Missed 1. **TOTP backup codes** — Every competitor ships these. Without them, a lost phone permanently locks users out. 8–10 single-use codes, stored hashed, acknowledged by user before TOTP is activated. 2. **Quota warnings at 80% and 95%** — PROJECT.md specifies rejection at 100% only. Pre-emptive warnings are table stakes (Google Drive, Dropbox both do this). In-app banner at 80% (amber) and 95% (red), plus a specific error at 100% showing current usage, rejected file size, and a link to storage settings. 3. **"Sign out all devices" / session revocation** — Users who believe their account is compromised need forced logout everywhere. Already handled by the `refresh_tokens` table — requires only an endpoint and a UI control. 4. **Breadcrumb navigation** — Folder CRUD is in PROJECT.md but not the navigation UX. Required for nested folder usability. 5. **Cloud storage connection status indicator** — PROJECT.md doesn't specify what happens when cloud storage is unreachable. Silent failure = data loss. Must show `ACTIVE | REQUIRES_REAUTH | ERROR` state and fall back to local storage with a clear message. 6. **Admin impersonation is an explicit architectural exclusion** — Must be documented as excluded, not just left unbuilt. Directly contradicts the privacy-first core value. --- ## Critical Architectural Decisions (Lock Before Building) These cannot be safely retrofitted: **1. JWT in httpOnly cookies** Refresh token: `httpOnly; Secure; SameSite=Strict` cookie. Access token: Pinia memory only. Never `localStorage`. Vue Router guard silently refreshes before redirecting to login. Axios `withCredentials: true`. **2. HKDF per-user key derivation for cloud credentials** `HKDF(master_key, salt=user_id_bytes, info=b"cloud-credentials")`. Master key in `CLOUD_CREDS_KEY` env var only. Salt in users table. Design before writing the first line of credential storage — cannot be added later without re-encrypting everything. **3. Presigned MinIO URL flow** FastAPI generates signed PUT URL → browser uploads directly to MinIO → FastAPI confirms object and commits quota atomically. FastAPI handles metadata only; bytes never pass through the API layer. Object keys: `{user_id}/{document_id}/{uuid4()}{ext}`. Human-readable filename in DB only. **4. Atomic PostgreSQL quota enforcement** `UPDATE quotas SET used_bytes = used_bytes + $delta WHERE user_id = $uid AND (used_bytes + $delta) <= limit_bytes RETURNING used_bytes`. If 0 rows returned, delete the MinIO object and return 413. Never perform quota arithmetic in Python between two DB statements. **5. BackgroundTasks replacement before horizontal scaling** FastAPI `BackgroundTasks` is per-instance — classification tasks cannot distribute across containers. Replace with Celery + Redis or pgqueuer (PostgreSQL-backed, no Redis dependency) before scaling to N instances. Decide during Phase 3 planning. **Additional locked decisions:** - Refresh tokens are opaque UUIDs stored hashed in DB (not JWTs); access tokens are short-lived JWTs (15 min). - `refresh_tokens` table has `family_id` — on reuse of a rotated token, revoke entire family and emit security alert. - Audit log uses `BIGSERIAL` PK; app DB user has INSERT + SELECT only (no UPDATE/DELETE). - Admin endpoints for cloud connections return only `provider, display_name, connected_at, status` — never `credentials_enc`. - Every document/folder endpoint asserts `resource.user_id == current_user.id` via centralized `assert_document_access()`. --- ## 5-Phase Migration Sequence ### Phase 1 — Infrastructure Foundation Wire PostgreSQL + MinIO into Docker Compose. Create `db/models.py` with full schema. Alembic initial migration. Async session dependency. No API changes — flat-file code still runs. Gate: all services boot cleanly; migrations apply; no behavior change. ### Phase 2 — Users and Authentication Users, refresh_tokens, quotas tables. Auth endpoints (register, login, refresh, TOTP, password reset, forced logout). TOTP with backup codes. Password reset does NOT auto-login (routes through TOTP gate). `get_current_user` + `get_current_admin` FastAPI dependencies. Admin user management endpoints. Vue auth store (Pinia memory + httpOnly cookie), Router guard, Axios interceptors. Gate: admin JWT returns 403 on document endpoints; backup codes issued and acknowledged at enrollment. ### Phase 3 — Document Migration to PostgreSQL + MinIO Dual-write window: new uploads write to both stores. Migration script copies historical flat-file data to PostgreSQL + MinIO. Count reconciliation assertion (go/no-go gate). Flip read source to PostgreSQL. Remove JSON write path. Presigned URL flow for all uploads/downloads. `asyncio.to_thread()` wrapping all MinIO SDK calls. Gate: concurrent upload test at 99% quota — only one succeeds. ### Phase 4 — Multi-User Isolation, Quotas, Folders, Sharing All queries gain `WHERE user_id = current_user.id`. Quota bar (80%/95% warnings). Folder CRUD + breadcrumbs. Document move + sort. Share by handle + "Shared with me" folder. Audit log wired to all events. Admin audit viewer. In-browser PDF preview. Gate: negative-access test (admin cannot retrieve any document content); quota reconciliation drift <1%. ### Phase 5 — Cloud Storage Backends `StorageBackend` ABC + factory (mirrors `ai/` pattern). `MinIOBackend`, `OneDriveBackend`, `GoogleDriveBackend`, `NextcloudBackend`, `WebDAVBackend`. OAuth2 connect/disconnect flows. Connection status UX. HKDF key derivation for all credentials. `delete_user_files()` on account deletion. Gate: mock `invalid_grant` → REQUIRES_REAUTH (not 500); account deletion asserts `delete_user_files()` per connection. --- ## Top 5 Pitfalls by Risk | # | Pitfall | Severity | Fix | |---|---|---|---| | 1 | JWT in localStorage — XSS bypasses TOTP entirely | CRITICAL | httpOnly cookie for refresh, Pinia memory for access token | | 2 | Quota race condition — concurrent uploads bypass limit | DATA INTEGRITY | Atomic PostgreSQL `UPDATE ... RETURNING` | | 3 | TOTP bypass via password reset — full 2FA bypass via email compromise | SECURITY | Reset issues `password_reset_pending` state, not a full session | | 4 | Single Fernet key for all cloud credentials — catastrophic on key leak | CATASTROPHIC | HKDF per-user derivation before first credential is stored | | 5 | Path traversal in MinIO keys — cross-user data access | SECURITY | UUID-only MinIO keys; human filename in DB only; never reconstruct key from request parameters | --- ## Confidence Assessment | Area | Confidence | Notes | |---|---|---| | Stack | MEDIUM-HIGH | Core libraries confirmed from FastAPI official release notes (PyJWT, pwdlib, SQLAlchemy 2.0, psycopg v3). Cloud SDK minor versions — verify on PyPI before pinning. | | Features | MEDIUM | Based on Google Drive, Dropbox, Box, Paperless-ngx knowledge through Aug 2025. | | Architecture | HIGH | FastAPI DI pattern from official docs; S3 presigned URLs and atomic PostgreSQL quota update are industry standards. | | Pitfalls | HIGH | OWASP cheat sheets; RFC 9700 refresh token rotation; GDPR Article 17 stable regulatory text. | **Overall: MEDIUM-HIGH** --- ## Gaps to Resolve During Planning - Verify cloud SDK minor versions on PyPI before pinning - Confirm PyOTP `valid_window` default in current docs (recommend `valid_window=1` for ±30s clock drift) - Decide Celery + Redis vs pgqueuer during Phase 3 (depends on Redis availability in deployment target) - Audit existing codebase for any existing bcrypt hashes before removing `passlib` - Validate MinIO Docker Compose public endpoint in Phase 3 acceptance testing (presigned URLs must use host-accessible address, not internal Docker network name)