# DocuVault — v1 Requirements _Last updated: 2026-05-21_ ## v1 Requirements ### Authentication (AUTH) - [x] **AUTH-01**: User can register with email and password (Argon2 hashing; strength enforced: ≥12 chars, uppercase, lowercase, number, special char; HaveIBeenPwned breach check) - [x] **AUTH-02**: User can log in and maintain a session (JWT access token in Pinia memory only — never localStorage; refresh token in `httpOnly; Secure; SameSite=Strict` cookie; 15-min access / 30-day refresh) - [ ] **AUTH-03**: User can enroll a TOTP authenticator app (RFC 6238; 8–10 single-use backup codes issued and explicitly acknowledged before TOTP is marked active) - [x] **AUTH-04**: User can complete login using TOTP code or a one-time backup code (backup code invalidated on use) - [ ] **AUTH-05**: User can reset password via email (signed token, 1-hour expiry; reset does not auto-login — user must pass TOTP gate on next login) - [ ] **AUTH-06**: User can sign out all active sessions (revokes all refresh tokens in DB; "sign out all devices" control in account settings) - [ ] **AUTH-07**: Refresh token rotation with family revocation — reuse of a rotated token revokes the entire family and emits a security alert to the user - [ ] **AUTH-08**: TOTP codes are single-use (mark used in DB within the validity window; prevent replay attacks) ### Security (SEC) — Cross-Cutting - [x] **SEC-01**: All state-changing endpoints are protected against CSRF (SameSite=Strict cookie + origin validation) - [x] **SEC-02**: Auth endpoints (login, register, password reset, TOTP verify) are rate-limited (per-IP and per-account) - [x] **SEC-03**: All DB queries use parameterized statements / ORM (zero raw string interpolation into queries) - [x] **SEC-04**: All file/document access resolved through DB lookup — object keys are never reconstructed from request parameters (prevents path traversal and cross-user access) - [x] **SEC-05**: Content-Security-Policy, X-Frame-Options, and X-Content-Type-Options headers set on all responses - [ ] **SEC-06**: Constant-time comparison used for all token and code verification (prevents timing attacks) - [ ] **SEC-07**: Admin role verified on every admin endpoint request; admin cannot access document content, extracted text, or cloud credentials in any response - [ ] **SEC-08**: Cloud credential ciphertext (`credentials_enc`) excluded from all API serializers by default — admin and user responses return only `provider, display_name, connected_at, status` - [ ] **SEC-09**: Account deletion triggers `delete_user_files()` on every active cloud connection before removing DB records (prevents orphaned cloud data and satisfies GDPR Article 17) ### Users & Admin (ADMIN) - [x] **ADMIN-01**: Admin can create user accounts (email, temporary password that must be changed on first login) - [x] **ADMIN-02**: Admin can deactivate a user account (blocks all logins and API access; data preserved) - [x] **ADMIN-03**: Admin can initiate password reset for a user (sends reset email; does not grant admin access to the account) - [x] **ADMIN-04**: Admin can view and adjust individual user storage quotas (warns if new limit is below current usage) - [x] **ADMIN-05**: Admin can assign AI provider and model per user (users cannot modify their own AI configuration) - [ ] **ADMIN-06**: Admin can view audit log filtered by date range, user, and action type (metadata only — no document content, filenames, or extracted text) - [x] **ADMIN-07**: Admin impersonation ("log in as user") is explicitly excluded by architecture — no endpoint or UI pathway exists ### Storage & Infrastructure (STORE) - [ ] **STORE-01**: Platform storage layer migrated from flat-file JSON + local filesystem to PostgreSQL (metadata) + MinIO (objects); existing documents preserved via dual-write migration script - [ ] **STORE-02**: Each user's MinIO objects use `{user_id}/{document_id}/{uuid4()}{ext}` keys — human-readable filenames stored in DB only - [ ] **STORE-03**: Each user has a 100 MB storage quota enforced atomically at upload using `UPDATE quotas SET used_bytes = used_bytes + $delta WHERE (used_bytes + $delta) <= limit_bytes RETURNING used_bytes` - [ ] **STORE-04**: User sees quota usage bar in sidebar (X MB of Y MB) with amber warning at 80% and red warning at 95% - [ ] **STORE-05**: Upload rejected at quota limit with a specific error showing current usage, rejected file size, and a link to storage settings - [ ] **STORE-06**: Document delete atomically decrements quota usage - [ ] **STORE-07**: Backend is stateless — no per-instance file locks; multiple instances can run behind a load balancer - [ ] **STORE-08**: FastAPI `BackgroundTasks` replaced with Celery + Redis or pgqueuer before horizontal scaling is enabled ### Folders & Organization (FOLD) - [ ] **FOLD-01**: User can create, rename, and delete folders (delete confirms content count before proceeding) - [ ] **FOLD-02**: User can move documents between folders - [ ] **FOLD-03**: Breadcrumb navigation renders current folder path; each segment is clickable to navigate up - [ ] **FOLD-04**: Document list supports sort by name, date uploaded, and file size - [ ] **FOLD-05**: Full-text search across user's documents (PostgreSQL `tsvector` index on extracted text) ### Document Sharing (SHARE) - [ ] **SHARE-01**: User can share a document with another user by their unique handle (at-handle or user ID) - [ ] **SHARE-02**: Shared documents appear in a "Shared with me" virtual folder for the recipient (no storage quota counted against recipient) - [ ] **SHARE-03**: Shared access is view-only by default; owner controls permission level - [ ] **SHARE-04**: Owner can revoke share access; revocation is immediate - [ ] **SHARE-05**: Documents shared with others display a "shared" indicator in the owner's list view ### Cloud Storage (CLOUD) - [ ] **CLOUD-01**: User can connect OneDrive (Microsoft Graph), Google Drive (v3 API), Nextcloud, or generic WebDAV as a personal storage backend - [ ] **CLOUD-02**: Cloud OAuth credentials encrypted using HKDF per-user key derivation (`HKDF(master_key, salt=user_id_bytes, info=b"cloud-credentials")`); master key in `CLOUD_CREDS_KEY` env var; never stored in DB - [ ] **CLOUD-03**: Local MinIO storage and connected cloud backends coexist; user can select their default storage destination - [ ] **CLOUD-04**: Each cloud connection displays status: `ACTIVE | REQUIRES_REAUTH | ERROR` - [ ] **CLOUD-05**: On OAuth revocation (`invalid_grant`), connection status transitions to `REQUIRES_REAUTH` — the error is surfaced to the user, not retried silently - [ ] **CLOUD-06**: User can disconnect a cloud backend; credentials are permanently deleted from the DB - [ ] **CLOUD-07**: Storage backend abstracted via `StorageBackend` ABC + factory in `storage/` module (mirrors existing `ai/` provider pattern) ### Documents & AI (DOC) - [ ] **DOC-01**: User can view document metadata and extracted text for any document in their library - [ ] **DOC-02**: In-browser PDF preview (PDF.js); document bytes proxied through the app — no presigned URLs exposed to the browser (privacy model) - [x] **DOC-03**: AI provider and model assigned by admin per user; user cannot change AI configuration - [x] **DOC-04**: System default topics + per-user topic overrides preserved from existing implementation - [x] **DOC-05**: AI classification uses the user's assigned provider and model (from DB, not from user-supplied settings) --- ## v2 Requirements (Deferred) - Subscription billing and payment processing (quota model designed to plug in) - SSO: Microsoft, Google, Apple (auth layer designed for extension) - Keycloak / SAML / OAuth2 enterprise federation - Group admin roles (groups table seeded in schema, unpopulated) - Share permission levels beyond view-only (edit, comment) - Document version history - Share expiry dates - Real-time collaboration or comments - Mobile app - GDPR data export (Article 20) — async background job, deferred to v2 - Email notifications for sharing events - Public link sharing (unauthenticated) --- ## Out of Scope - Admin impersonation / "log in as user" — violates privacy-first core value; explicit architectural exclusion - Document editing or annotation — not planned - Document viewer for non-PDF types beyond metadata (DOCX, image renders) — v2 - AI-generated document summaries beyond topic classification — v2 - Webhooks or API access for third parties — not planned for v1 --- ## Traceability _Filled by roadmapper — 2026-05-21._ | REQ-ID | Phase | Notes | |---|---|---| | STORE-01 | 1 | Dual-write migration script; schema and Alembic wiring | | STORE-02 | 1 | Object key schema enforced in model layer | | STORE-07 | 1 | Stateless backend; no per-instance file locks | | AUTH-01 | 2 | Registration with Argon2 + HaveIBeenPwned check | | AUTH-02 | 2 | JWT session; httpOnly refresh cookie; Pinia memory access token | | AUTH-03 | 2 | TOTP enrollment with backup code acknowledgement flow | | AUTH-04 | 2 | Login via TOTP code or single-use backup code | | AUTH-05 | 2 | Password reset email; routes back to TOTP gate | | AUTH-06 | 2 | Sign out all devices; revokes all refresh tokens | | AUTH-07 | 2 | Refresh token family revocation on reuse; security alert | | AUTH-08 | 2 | TOTP single-use enforcement within validity window | | SEC-01 | 2 | CSRF protection on all state-changing endpoints | | SEC-02 | 2 | Rate limiting on auth endpoints (per-IP and per-account) | | SEC-03 | 2 | Parameterized queries / ORM enforced from first migration | | SEC-05 | 2 | Security response headers on all responses | | SEC-06 | 2 | Constant-time comparison for token/code verification | | SEC-07 | 2 | Admin role dependency; admin blocked from document content | | ADMIN-01 | 2 | Admin creates user with temporary password | | ADMIN-02 | 2 | Admin deactivates user account | | ADMIN-03 | 2 | Admin initiates password reset for user | | ADMIN-04 | 2 | Admin views and adjusts user storage quotas | | ADMIN-05 | 2 | Admin assigns AI provider and model per user | | ADMIN-07 | 2 | Explicit architectural exclusion of admin impersonation | | STORE-03 | 3 | Atomic quota enforcement at upload | | STORE-04 | 3 | Quota usage bar with 80%/95% warnings | | STORE-05 | 3 | Upload rejection at quota limit with detailed error | | STORE-06 | 3 | Atomic quota decrement on document delete | | STORE-08 | 3 | BackgroundTasks replaced with Celery+Redis or pgqueuer | | SEC-04 | 3 | DB-lookup-only file access; no key reconstruction from params | | DOC-03 | 3 | AI provider/model from DB per user; not user-supplied | | DOC-04 | 3 | System default topics + per-user topic overrides preserved | | DOC-05 | 3 | Classification uses user's assigned provider and model | | FOLD-01 | 4 | Folder CRUD with content-count confirmation on delete | | FOLD-02 | 4 | Document move between folders | | FOLD-03 | 4 | Breadcrumb navigation with clickable path segments | | FOLD-04 | 4 | Document list sort by name, date, and file size | | FOLD-05 | 4 | Full-text search via PostgreSQL tsvector index | | SHARE-01 | 4 | Share document by user handle | | SHARE-02 | 4 | "Shared with me" virtual folder; no quota charged to recipient | | SHARE-03 | 4 | View-only default sharing; owner controls permission level | | SHARE-04 | 4 | Immediate share revocation | | SHARE-05 | 4 | Shared indicator on documents in owner's list view | | SEC-08 | 4 | credentials_enc excluded from all serializers | | SEC-09 | 4 | Account deletion triggers delete_user_files() per cloud connection | | ADMIN-06 | 4 | Admin audit log viewer filtered by date, user, action | | DOC-01 | 4 | View document metadata and extracted text | | DOC-02 | 4 | In-browser PDF preview via PDF.js; bytes proxied through app | | CLOUD-01 | 5 | Connect OneDrive, Google Drive, Nextcloud, WebDAV | | CLOUD-02 | 5 | HKDF per-user key derivation for credential encryption | | CLOUD-03 | 5 | Local and cloud storage coexist; user selects default | | CLOUD-04 | 5 | Connection status display: ACTIVE / REQUIRES_REAUTH / ERROR | | CLOUD-05 | 5 | invalid_grant transitions to REQUIRES_REAUTH; surfaced to user | | CLOUD-06 | 5 | Disconnect cloud backend; credentials permanently deleted | | CLOUD-07 | 5 | StorageBackend ABC + factory in storage/ module |