be6ff5a71f
- Created 05-05-SUMMARY.md: cloud.py (7 endpoints), main.py (router registration), admin.py (SEC-09 cleanup) - Updated STATE.md: plan advanced to 5/8, session log updated, decisions recorded - Updated ROADMAP.md: 05-03, 05-04, 05-05 marked complete - Updated REQUIREMENTS.md: SEC-09 marked complete (cloud credential purge on account deletion)
12 KiB
12 KiB
DocuVault — v1 Requirements
Last updated: 2026-05-21
v1 Requirements
Authentication (AUTH)
- AUTH-01: User can register with email and password (Argon2 hashing; strength enforced: ≥12 chars, uppercase, lowercase, number, special char; HaveIBeenPwned breach check)
- AUTH-02: User can log in and maintain a session (JWT access token in Pinia memory only — never localStorage; refresh token in
httpOnly; Secure; SameSite=Strictcookie; 15-min access / 30-day refresh) - AUTH-03: User can enroll a TOTP authenticator app (RFC 6238; 8–10 single-use backup codes issued and explicitly acknowledged before TOTP is marked active)
- AUTH-04: User can complete login using TOTP code or a one-time backup code (backup code invalidated on use)
- AUTH-05: User can reset password via email (signed token, 1-hour expiry; reset does not auto-login — user must pass TOTP gate on next login)
- AUTH-06: User can sign out all active sessions (revokes all refresh tokens in DB; "sign out all devices" control in account settings)
- AUTH-07: Refresh token rotation with family revocation — reuse of a rotated token revokes the entire family and emits a security alert to the user
- AUTH-08: TOTP codes are single-use (mark used in DB within the validity window; prevent replay attacks)
Security (SEC) — Cross-Cutting
- SEC-01: All state-changing endpoints are protected against CSRF (SameSite=Strict cookie + origin validation)
- SEC-02: Auth endpoints (login, register, password reset, TOTP verify) are rate-limited (per-IP and per-account)
- SEC-03: All DB queries use parameterized statements / ORM (zero raw string interpolation into queries)
- SEC-04: All file/document access resolved through DB lookup — object keys are never reconstructed from request parameters (prevents path traversal and cross-user access)
- SEC-05: Content-Security-Policy, X-Frame-Options, and X-Content-Type-Options headers set on all responses
- SEC-06: Constant-time comparison used for all token and code verification (prevents timing attacks)
- SEC-07: Admin role verified on every admin endpoint request; admin cannot access document content, extracted text, or cloud credentials in any response
- SEC-08: Cloud credential ciphertext (
credentials_enc) excluded from all API serializers by default — admin and user responses return onlyprovider, display_name, connected_at, status - SEC-09: Account deletion triggers
delete_user_files()on every active cloud connection before removing DB records (prevents orphaned cloud data and satisfies GDPR Article 17)
Users & Admin (ADMIN)
- ADMIN-01: Admin can create user accounts (email, temporary password that must be changed on first login)
- ADMIN-02: Admin can deactivate a user account (blocks all logins and API access; data preserved)
- ADMIN-03: Admin can initiate password reset for a user (sends reset email; does not grant admin access to the account)
- ADMIN-04: Admin can view and adjust individual user storage quotas (warns if new limit is below current usage)
- ADMIN-05: Admin can assign AI provider and model per user (users cannot modify their own AI configuration)
- ADMIN-06: Admin can view audit log filtered by date range, user, and action type (metadata only — no document content, filenames, or extracted text)
- ADMIN-07: Admin impersonation ("log in as user") is explicitly excluded by architecture — no endpoint or UI pathway exists
Storage & Infrastructure (STORE)
- STORE-01: Platform storage layer migrated from flat-file JSON + local filesystem to PostgreSQL (metadata) + MinIO (objects); existing documents preserved via dual-write migration script
- STORE-02: Each user's MinIO objects use
{user_id}/{document_id}/{uuid4()}{ext}keys — human-readable filenames stored in DB only - STORE-03: Each user has a 100 MB storage quota enforced atomically at upload using
UPDATE quotas SET used_bytes = used_bytes + $delta WHERE (used_bytes + $delta) <= limit_bytes RETURNING used_bytes - STORE-04: User sees quota usage bar in sidebar (X MB of Y MB) with amber warning at 80% and red warning at 95%
- STORE-05: Upload rejected at quota limit with a specific error showing current usage, rejected file size, and a link to storage settings
- STORE-06: Document delete atomically decrements quota usage
- STORE-07: Backend is stateless — no per-instance file locks; multiple instances can run behind a load balancer
- STORE-08: FastAPI
BackgroundTasksreplaced with Celery + Redis or pgqueuer before horizontal scaling is enabled
Folders & Organization (FOLD)
- FOLD-01: User can create, rename, and delete folders (delete confirms content count before proceeding)
- FOLD-02: User can move documents between folders
- FOLD-03: Breadcrumb navigation renders current folder path; each segment is clickable to navigate up
- FOLD-04: Document list supports sort by name, date uploaded, and file size
- FOLD-05: Full-text search across user's documents (PostgreSQL
tsvectorindex on extracted text)
Document Sharing (SHARE)
- SHARE-01: User can share a document with another user by their unique handle (at-handle or user ID)
- SHARE-02: Shared documents appear in a "Shared with me" virtual folder for the recipient (no storage quota counted against recipient)
- SHARE-03: Shared access is view-only by default; owner controls permission level
- SHARE-04: Owner can revoke share access; revocation is immediate
- SHARE-05: Documents shared with others display a "shared" indicator in the owner's list view
Cloud Storage (CLOUD)
- CLOUD-01: User can connect OneDrive (Microsoft Graph), Google Drive (v3 API), Nextcloud, or generic WebDAV as a personal storage backend
- CLOUD-02: Cloud OAuth credentials encrypted using HKDF per-user key derivation (
HKDF(master_key, salt=user_id_bytes, info=b"cloud-credentials")); master key inCLOUD_CREDS_KEYenv var; never stored in DB - CLOUD-03: Local MinIO storage and connected cloud backends coexist; user can select their default storage destination
- CLOUD-04: Each cloud connection displays status:
ACTIVE | REQUIRES_REAUTH | ERROR - CLOUD-05: On OAuth revocation (
invalid_grant), connection status transitions toREQUIRES_REAUTH— the error is surfaced to the user, not retried silently - CLOUD-06: User can disconnect a cloud backend; credentials are permanently deleted from the DB
- CLOUD-07: Storage backend abstracted via
StorageBackendABC + factory instorage/module (mirrors existingai/provider pattern)
Documents & AI (DOC)
- DOC-01: User can view document metadata and extracted text for any document in their library
- DOC-02: In-browser PDF preview (PDF.js); document bytes proxied through the app — no presigned URLs exposed to the browser (privacy model)
- DOC-03: AI provider and model assigned by admin per user; user cannot change AI configuration
- DOC-04: System default topics + per-user topic overrides preserved from existing implementation
- DOC-05: AI classification uses the user's assigned provider and model (from DB, not from user-supplied settings)
v2 Requirements (Deferred)
- Subscription billing and payment processing (quota model designed to plug in)
- SSO: Microsoft, Google, Apple (auth layer designed for extension)
- Keycloak / SAML / OAuth2 enterprise federation
- Group admin roles (groups table seeded in schema, unpopulated)
- Share permission levels beyond view-only (edit, comment)
- Document version history
- Share expiry dates
- Real-time collaboration or comments
- Mobile app
- GDPR data export (Article 20) — async background job, deferred to v2
- Email notifications for sharing events
- Public link sharing (unauthenticated)
Out of Scope
- Admin impersonation / "log in as user" — violates privacy-first core value; explicit architectural exclusion
- Document editing or annotation — not planned
- Document viewer for non-PDF types beyond metadata (DOCX, image renders) — v2
- AI-generated document summaries beyond topic classification — v2
- Webhooks or API access for third parties — not planned for v1
Traceability
Filled by roadmapper — 2026-05-21.
| REQ-ID | Phase | Notes |
|---|---|---|
| STORE-01 | 1 | Dual-write migration script; schema and Alembic wiring |
| STORE-02 | 1 | Object key schema enforced in model layer |
| STORE-07 | 1 | Stateless backend; no per-instance file locks |
| AUTH-01 | 2 | Registration with Argon2 + HaveIBeenPwned check |
| AUTH-02 | 2 | JWT session; httpOnly refresh cookie; Pinia memory access token |
| AUTH-03 | 2 | TOTP enrollment with backup code acknowledgement flow |
| AUTH-04 | 2 | Login via TOTP code or single-use backup code |
| AUTH-05 | 2 | Password reset email; routes back to TOTP gate |
| AUTH-06 | 2 | Sign out all devices; revokes all refresh tokens |
| AUTH-07 | 2 | Refresh token family revocation on reuse; security alert |
| AUTH-08 | 2 | TOTP single-use enforcement within validity window |
| SEC-01 | 2 | CSRF protection on all state-changing endpoints |
| SEC-02 | 2 | Rate limiting on auth endpoints (per-IP and per-account) |
| SEC-03 | 2 | Parameterized queries / ORM enforced from first migration |
| SEC-05 | 2 | Security response headers on all responses |
| SEC-06 | 2 | Constant-time comparison for token/code verification |
| SEC-07 | 2 | Admin role dependency; admin blocked from document content |
| ADMIN-01 | 2 | Admin creates user with temporary password |
| ADMIN-02 | 2 | Admin deactivates user account |
| ADMIN-03 | 2 | Admin initiates password reset for user |
| ADMIN-04 | 2 | Admin views and adjusts user storage quotas |
| ADMIN-05 | 2 | Admin assigns AI provider and model per user |
| ADMIN-07 | 2 | Explicit architectural exclusion of admin impersonation |
| STORE-03 | 3 | Atomic quota enforcement at upload |
| STORE-04 | 3 | Quota usage bar with 80%/95% warnings |
| STORE-05 | 3 | Upload rejection at quota limit with detailed error |
| STORE-06 | 3 | Atomic quota decrement on document delete |
| STORE-08 | 3 | BackgroundTasks replaced with Celery+Redis or pgqueuer |
| SEC-04 | 3 | DB-lookup-only file access; no key reconstruction from params |
| DOC-03 | 3 | AI provider/model from DB per user; not user-supplied |
| DOC-04 | 3 | System default topics + per-user topic overrides preserved |
| DOC-05 | 3 | Classification uses user's assigned provider and model |
| FOLD-01 | 4 | Folder CRUD with content-count confirmation on delete |
| FOLD-02 | 4 | Document move between folders |
| FOLD-03 | 4 | Breadcrumb navigation with clickable path segments |
| FOLD-04 | 4 | Document list sort by name, date, and file size |
| FOLD-05 | 4 | Full-text search via PostgreSQL tsvector index |
| SHARE-01 | 4 | Share document by user handle |
| SHARE-02 | 4 | "Shared with me" virtual folder; no quota charged to recipient |
| SHARE-03 | 4 | View-only default sharing; owner controls permission level |
| SHARE-04 | 4 | Immediate share revocation |
| SHARE-05 | 4 | Shared indicator on documents in owner's list view |
| SEC-08 | 4 | credentials_enc excluded from all serializers |
| SEC-09 | 4 | Account deletion triggers delete_user_files() per cloud connection |
| ADMIN-06 | 4 | Admin audit log viewer filtered by date, user, action |
| DOC-01 | 4 | View document metadata and extracted text |
| DOC-02 | 4 | In-browser PDF preview via PDF.js; bytes proxied through app |
| CLOUD-01 | 5 | Connect OneDrive, Google Drive, Nextcloud, WebDAV |
| CLOUD-02 | 5 | HKDF per-user key derivation for credential encryption |
| CLOUD-03 | 5 | Local and cloud storage coexist; user selects default |
| CLOUD-04 | 5 | Connection status display: ACTIVE / REQUIRES_REAUTH / ERROR |
| CLOUD-05 | 5 | invalid_grant transitions to REQUIRES_REAUTH; surfaced to user |
| CLOUD-06 | 5 | Disconnect cloud backend; credentials permanently deleted |
| CLOUD-07 | 5 | StorageBackend ABC + factory in storage/ module |