Files
kite/.planning/REQUIREMENTS.md
T
2026-05-21 20:53:28 +02:00

174 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# DocuVault — v1 Requirements
_Last updated: 2026-05-21_
## v1 Requirements
### Authentication (AUTH)
- [ ] **AUTH-01**: User can register with email and password (Argon2 hashing; strength enforced: ≥12 chars, uppercase, lowercase, number, special char; HaveIBeenPwned breach check)
- [ ] **AUTH-02**: User can log in and maintain a session (JWT access token in Pinia memory only — never localStorage; refresh token in `httpOnly; Secure; SameSite=Strict` cookie; 15-min access / 30-day refresh)
- [ ] **AUTH-03**: User can enroll a TOTP authenticator app (RFC 6238; 810 single-use backup codes issued and explicitly acknowledged before TOTP is marked active)
- [ ] **AUTH-04**: User can complete login using TOTP code or a one-time backup code (backup code invalidated on use)
- [ ] **AUTH-05**: User can reset password via email (signed token, 1-hour expiry; reset does not auto-login — user must pass TOTP gate on next login)
- [ ] **AUTH-06**: User can sign out all active sessions (revokes all refresh tokens in DB; "sign out all devices" control in account settings)
- [ ] **AUTH-07**: Refresh token rotation with family revocation — reuse of a rotated token revokes the entire family and emits a security alert to the user
- [ ] **AUTH-08**: TOTP codes are single-use (mark used in DB within the validity window; prevent replay attacks)
### Security (SEC) — Cross-Cutting
- [ ] **SEC-01**: All state-changing endpoints are protected against CSRF (SameSite=Strict cookie + origin validation)
- [ ] **SEC-02**: Auth endpoints (login, register, password reset, TOTP verify) are rate-limited (per-IP and per-account)
- [ ] **SEC-03**: All DB queries use parameterized statements / ORM (zero raw string interpolation into queries)
- [ ] **SEC-04**: All file/document access resolved through DB lookup — object keys are never reconstructed from request parameters (prevents path traversal and cross-user access)
- [ ] **SEC-05**: Content-Security-Policy, X-Frame-Options, and X-Content-Type-Options headers set on all responses
- [ ] **SEC-06**: Constant-time comparison used for all token and code verification (prevents timing attacks)
- [ ] **SEC-07**: Admin role verified on every admin endpoint request; admin cannot access document content, extracted text, or cloud credentials in any response
- [ ] **SEC-08**: Cloud credential ciphertext (`credentials_enc`) excluded from all API serializers by default — admin and user responses return only `provider, display_name, connected_at, status`
- [ ] **SEC-09**: Account deletion triggers `delete_user_files()` on every active cloud connection before removing DB records (prevents orphaned cloud data and satisfies GDPR Article 17)
### Users & Admin (ADMIN)
- [ ] **ADMIN-01**: Admin can create user accounts (email, temporary password that must be changed on first login)
- [ ] **ADMIN-02**: Admin can deactivate a user account (blocks all logins and API access; data preserved)
- [ ] **ADMIN-03**: Admin can initiate password reset for a user (sends reset email; does not grant admin access to the account)
- [ ] **ADMIN-04**: Admin can view and adjust individual user storage quotas (warns if new limit is below current usage)
- [ ] **ADMIN-05**: Admin can assign AI provider and model per user (users cannot modify their own AI configuration)
- [ ] **ADMIN-06**: Admin can view audit log filtered by date range, user, and action type (metadata only — no document content, filenames, or extracted text)
- [ ] **ADMIN-07**: Admin impersonation ("log in as user") is explicitly excluded by architecture — no endpoint or UI pathway exists
### Storage & Infrastructure (STORE)
- [ ] **STORE-01**: Platform storage layer migrated from flat-file JSON + local filesystem to PostgreSQL (metadata) + MinIO (objects); existing documents preserved via dual-write migration script
- [ ] **STORE-02**: Each user's MinIO objects use `{user_id}/{document_id}/{uuid4()}{ext}` keys — human-readable filenames stored in DB only
- [ ] **STORE-03**: Each user has a 100 MB storage quota enforced atomically at upload using `UPDATE quotas SET used_bytes = used_bytes + $delta WHERE (used_bytes + $delta) <= limit_bytes RETURNING used_bytes`
- [ ] **STORE-04**: User sees quota usage bar in sidebar (X MB of Y MB) with amber warning at 80% and red warning at 95%
- [ ] **STORE-05**: Upload rejected at quota limit with a specific error showing current usage, rejected file size, and a link to storage settings
- [ ] **STORE-06**: Document delete atomically decrements quota usage
- [ ] **STORE-07**: Backend is stateless — no per-instance file locks; multiple instances can run behind a load balancer
- [ ] **STORE-08**: FastAPI `BackgroundTasks` replaced with Celery + Redis or pgqueuer before horizontal scaling is enabled
### Folders & Organization (FOLD)
- [ ] **FOLD-01**: User can create, rename, and delete folders (delete confirms content count before proceeding)
- [ ] **FOLD-02**: User can move documents between folders
- [ ] **FOLD-03**: Breadcrumb navigation renders current folder path; each segment is clickable to navigate up
- [ ] **FOLD-04**: Document list supports sort by name, date uploaded, and file size
- [ ] **FOLD-05**: Full-text search across user's documents (PostgreSQL `tsvector` index on extracted text)
### Document Sharing (SHARE)
- [ ] **SHARE-01**: User can share a document with another user by their unique handle (at-handle or user ID)
- [ ] **SHARE-02**: Shared documents appear in a "Shared with me" virtual folder for the recipient (no storage quota counted against recipient)
- [ ] **SHARE-03**: Shared access is view-only by default; owner controls permission level
- [ ] **SHARE-04**: Owner can revoke share access; revocation is immediate
- [ ] **SHARE-05**: Documents shared with others display a "shared" indicator in the owner's list view
### Cloud Storage (CLOUD)
- [ ] **CLOUD-01**: User can connect OneDrive (Microsoft Graph), Google Drive (v3 API), Nextcloud, or generic WebDAV as a personal storage backend
- [ ] **CLOUD-02**: Cloud OAuth credentials encrypted using HKDF per-user key derivation (`HKDF(master_key, salt=user_id_bytes, info=b"cloud-credentials")`); master key in `CLOUD_CREDS_KEY` env var; never stored in DB
- [ ] **CLOUD-03**: Local MinIO storage and connected cloud backends coexist; user can select their default storage destination
- [ ] **CLOUD-04**: Each cloud connection displays status: `ACTIVE | REQUIRES_REAUTH | ERROR`
- [ ] **CLOUD-05**: On OAuth revocation (`invalid_grant`), connection status transitions to `REQUIRES_REAUTH` — the error is surfaced to the user, not retried silently
- [ ] **CLOUD-06**: User can disconnect a cloud backend; credentials are permanently deleted from the DB
- [ ] **CLOUD-07**: Storage backend abstracted via `StorageBackend` ABC + factory in `storage/` module (mirrors existing `ai/` provider pattern)
### Documents & AI (DOC)
- [ ] **DOC-01**: User can view document metadata and extracted text for any document in their library
- [ ] **DOC-02**: In-browser PDF preview (PDF.js); document bytes proxied through the app — no presigned URLs exposed to the browser (privacy model)
- [ ] **DOC-03**: AI provider and model assigned by admin per user; user cannot change AI configuration
- [ ] **DOC-04**: System default topics + per-user topic overrides preserved from existing implementation
- [ ] **DOC-05**: AI classification uses the user's assigned provider and model (from DB, not from user-supplied settings)
---
## v2 Requirements (Deferred)
- Subscription billing and payment processing (quota model designed to plug in)
- SSO: Microsoft, Google, Apple (auth layer designed for extension)
- Keycloak / SAML / OAuth2 enterprise federation
- Group admin roles (groups table seeded in schema, unpopulated)
- Share permission levels beyond view-only (edit, comment)
- Document version history
- Share expiry dates
- Real-time collaboration or comments
- Mobile app
- GDPR data export (Article 20) — async background job, deferred to v2
- Email notifications for sharing events
- Public link sharing (unauthenticated)
---
## Out of Scope
- Admin impersonation / "log in as user" — violates privacy-first core value; explicit architectural exclusion
- Document editing or annotation — not planned
- Document viewer for non-PDF types beyond metadata (DOCX, image renders) — v2
- AI-generated document summaries beyond topic classification — v2
- Webhooks or API access for third parties — not planned for v1
---
## Traceability
_Filled by roadmapper — 2026-05-21._
| REQ-ID | Phase | Notes |
|---|---|---|
| STORE-01 | 1 | Dual-write migration script; schema and Alembic wiring |
| STORE-02 | 1 | Object key schema enforced in model layer |
| STORE-07 | 1 | Stateless backend; no per-instance file locks |
| AUTH-01 | 2 | Registration with Argon2 + HaveIBeenPwned check |
| AUTH-02 | 2 | JWT session; httpOnly refresh cookie; Pinia memory access token |
| AUTH-03 | 2 | TOTP enrollment with backup code acknowledgement flow |
| AUTH-04 | 2 | Login via TOTP code or single-use backup code |
| AUTH-05 | 2 | Password reset email; routes back to TOTP gate |
| AUTH-06 | 2 | Sign out all devices; revokes all refresh tokens |
| AUTH-07 | 2 | Refresh token family revocation on reuse; security alert |
| AUTH-08 | 2 | TOTP single-use enforcement within validity window |
| SEC-01 | 2 | CSRF protection on all state-changing endpoints |
| SEC-02 | 2 | Rate limiting on auth endpoints (per-IP and per-account) |
| SEC-03 | 2 | Parameterized queries / ORM enforced from first migration |
| SEC-05 | 2 | Security response headers on all responses |
| SEC-06 | 2 | Constant-time comparison for token/code verification |
| SEC-07 | 2 | Admin role dependency; admin blocked from document content |
| ADMIN-01 | 2 | Admin creates user with temporary password |
| ADMIN-02 | 2 | Admin deactivates user account |
| ADMIN-03 | 2 | Admin initiates password reset for user |
| ADMIN-04 | 2 | Admin views and adjusts user storage quotas |
| ADMIN-05 | 2 | Admin assigns AI provider and model per user |
| ADMIN-07 | 2 | Explicit architectural exclusion of admin impersonation |
| STORE-03 | 3 | Atomic quota enforcement at upload |
| STORE-04 | 3 | Quota usage bar with 80%/95% warnings |
| STORE-05 | 3 | Upload rejection at quota limit with detailed error |
| STORE-06 | 3 | Atomic quota decrement on document delete |
| STORE-08 | 3 | BackgroundTasks replaced with Celery+Redis or pgqueuer |
| SEC-04 | 3 | DB-lookup-only file access; no key reconstruction from params |
| DOC-03 | 3 | AI provider/model from DB per user; not user-supplied |
| DOC-04 | 3 | System default topics + per-user topic overrides preserved |
| DOC-05 | 3 | Classification uses user's assigned provider and model |
| FOLD-01 | 4 | Folder CRUD with content-count confirmation on delete |
| FOLD-02 | 4 | Document move between folders |
| FOLD-03 | 4 | Breadcrumb navigation with clickable path segments |
| FOLD-04 | 4 | Document list sort by name, date, and file size |
| FOLD-05 | 4 | Full-text search via PostgreSQL tsvector index |
| SHARE-01 | 4 | Share document by user handle |
| SHARE-02 | 4 | "Shared with me" virtual folder; no quota charged to recipient |
| SHARE-03 | 4 | View-only default sharing; owner controls permission level |
| SHARE-04 | 4 | Immediate share revocation |
| SHARE-05 | 4 | Shared indicator on documents in owner's list view |
| SEC-08 | 4 | credentials_enc excluded from all serializers |
| SEC-09 | 4 | Account deletion triggers delete_user_files() per cloud connection |
| ADMIN-06 | 4 | Admin audit log viewer filtered by date, user, action |
| DOC-01 | 4 | View document metadata and extracted text |
| DOC-02 | 4 | In-browser PDF preview via PDF.js; bytes proxied through app |
| CLOUD-01 | 5 | Connect OneDrive, Google Drive, Nextcloud, WebDAV |
| CLOUD-02 | 5 | HKDF per-user key derivation for credential encryption |
| CLOUD-03 | 5 | Local and cloud storage coexist; user selects default |
| CLOUD-04 | 5 | Connection status display: ACTIVE / REQUIRES_REAUTH / ERROR |
| CLOUD-05 | 5 | invalid_grant transitions to REQUIRES_REAUTH; surfaced to user |
| CLOUD-06 | 5 | Disconnect cloud backend; credentials permanently deleted |
| CLOUD-07 | 5 | StorageBackend ABC + factory in storage/ module |