Files
kite/.planning/REQUIREMENTS.md
T
2026-05-21 20:47:20 +02:00

121 lines
8.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# DocuVault — v1 Requirements
_Last updated: 2026-05-21_
## v1 Requirements
### Authentication (AUTH)
- [ ] **AUTH-01**: User can register with email and password (Argon2 hashing; strength enforced: ≥12 chars, uppercase, lowercase, number, special char; HaveIBeenPwned breach check)
- [ ] **AUTH-02**: User can log in and maintain a session (JWT access token in Pinia memory only — never localStorage; refresh token in `httpOnly; Secure; SameSite=Strict` cookie; 15-min access / 30-day refresh)
- [ ] **AUTH-03**: User can enroll a TOTP authenticator app (RFC 6238; 810 single-use backup codes issued and explicitly acknowledged before TOTP is marked active)
- [ ] **AUTH-04**: User can complete login using TOTP code or a one-time backup code (backup code invalidated on use)
- [ ] **AUTH-05**: User can reset password via email (signed token, 1-hour expiry; reset does not auto-login — user must pass TOTP gate on next login)
- [ ] **AUTH-06**: User can sign out all active sessions (revokes all refresh tokens in DB; "sign out all devices" control in account settings)
- [ ] **AUTH-07**: Refresh token rotation with family revocation — reuse of a rotated token revokes the entire family and emits a security alert to the user
- [ ] **AUTH-08**: TOTP codes are single-use (mark used in DB within the validity window; prevent replay attacks)
### Security (SEC) — Cross-Cutting
- [ ] **SEC-01**: All state-changing endpoints are protected against CSRF (SameSite=Strict cookie + origin validation)
- [ ] **SEC-02**: Auth endpoints (login, register, password reset, TOTP verify) are rate-limited (per-IP and per-account)
- [ ] **SEC-03**: All DB queries use parameterized statements / ORM (zero raw string interpolation into queries)
- [ ] **SEC-04**: All file/document access resolved through DB lookup — object keys are never reconstructed from request parameters (prevents path traversal and cross-user access)
- [ ] **SEC-05**: Content-Security-Policy, X-Frame-Options, and X-Content-Type-Options headers set on all responses
- [ ] **SEC-06**: Constant-time comparison used for all token and code verification (prevents timing attacks)
- [ ] **SEC-07**: Admin role verified on every admin endpoint request; admin cannot access document content, extracted text, or cloud credentials in any response
- [ ] **SEC-08**: Cloud credential ciphertext (`credentials_enc`) excluded from all API serializers by default — admin and user responses return only `provider, display_name, connected_at, status`
- [ ] **SEC-09**: Account deletion triggers `delete_user_files()` on every active cloud connection before removing DB records (prevents orphaned cloud data and satisfies GDPR Article 17)
### Users & Admin (ADMIN)
- [ ] **ADMIN-01**: Admin can create user accounts (email, temporary password that must be changed on first login)
- [ ] **ADMIN-02**: Admin can deactivate a user account (blocks all logins and API access; data preserved)
- [ ] **ADMIN-03**: Admin can initiate password reset for a user (sends reset email; does not grant admin access to the account)
- [ ] **ADMIN-04**: Admin can view and adjust individual user storage quotas (warns if new limit is below current usage)
- [ ] **ADMIN-05**: Admin can assign AI provider and model per user (users cannot modify their own AI configuration)
- [ ] **ADMIN-06**: Admin can view audit log filtered by date range, user, and action type (metadata only — no document content, filenames, or extracted text)
- [ ] **ADMIN-07**: Admin impersonation ("log in as user") is explicitly excluded by architecture — no endpoint or UI pathway exists
### Storage & Infrastructure (STORE)
- [ ] **STORE-01**: Platform storage layer migrated from flat-file JSON + local filesystem to PostgreSQL (metadata) + MinIO (objects); existing documents preserved via dual-write migration script
- [ ] **STORE-02**: Each user's MinIO objects use `{user_id}/{document_id}/{uuid4()}{ext}` keys — human-readable filenames stored in DB only
- [ ] **STORE-03**: Each user has a 100 MB storage quota enforced atomically at upload using `UPDATE quotas SET used_bytes = used_bytes + $delta WHERE (used_bytes + $delta) <= limit_bytes RETURNING used_bytes`
- [ ] **STORE-04**: User sees quota usage bar in sidebar (X MB of Y MB) with amber warning at 80% and red warning at 95%
- [ ] **STORE-05**: Upload rejected at quota limit with a specific error showing current usage, rejected file size, and a link to storage settings
- [ ] **STORE-06**: Document delete atomically decrements quota usage
- [ ] **STORE-07**: Backend is stateless — no per-instance file locks; multiple instances can run behind a load balancer
- [ ] **STORE-08**: FastAPI `BackgroundTasks` replaced with Celery + Redis or pgqueuer before horizontal scaling is enabled
### Folders & Organization (FOLD)
- [ ] **FOLD-01**: User can create, rename, and delete folders (delete confirms content count before proceeding)
- [ ] **FOLD-02**: User can move documents between folders
- [ ] **FOLD-03**: Breadcrumb navigation renders current folder path; each segment is clickable to navigate up
- [ ] **FOLD-04**: Document list supports sort by name, date uploaded, and file size
- [ ] **FOLD-05**: Full-text search across user's documents (PostgreSQL `tsvector` index on extracted text)
### Document Sharing (SHARE)
- [ ] **SHARE-01**: User can share a document with another user by their unique handle (at-handle or user ID)
- [ ] **SHARE-02**: Shared documents appear in a "Shared with me" virtual folder for the recipient (no storage quota counted against recipient)
- [ ] **SHARE-03**: Shared access is view-only by default; owner controls permission level
- [ ] **SHARE-04**: Owner can revoke share access; revocation is immediate
- [ ] **SHARE-05**: Documents shared with others display a "shared" indicator in the owner's list view
### Cloud Storage (CLOUD)
- [ ] **CLOUD-01**: User can connect OneDrive (Microsoft Graph), Google Drive (v3 API), Nextcloud, or generic WebDAV as a personal storage backend
- [ ] **CLOUD-02**: Cloud OAuth credentials encrypted using HKDF per-user key derivation (`HKDF(master_key, salt=user_id_bytes, info=b"cloud-credentials")`); master key in `CLOUD_CREDS_KEY` env var; never stored in DB
- [ ] **CLOUD-03**: Local MinIO storage and connected cloud backends coexist; user can select their default storage destination
- [ ] **CLOUD-04**: Each cloud connection displays status: `ACTIVE | REQUIRES_REAUTH | ERROR`
- [ ] **CLOUD-05**: On OAuth revocation (`invalid_grant`), connection status transitions to `REQUIRES_REAUTH` — the error is surfaced to the user, not retried silently
- [ ] **CLOUD-06**: User can disconnect a cloud backend; credentials are permanently deleted from the DB
- [ ] **CLOUD-07**: Storage backend abstracted via `StorageBackend` ABC + factory in `storage/` module (mirrors existing `ai/` provider pattern)
### Documents & AI (DOC)
- [ ] **DOC-01**: User can view document metadata and extracted text for any document in their library
- [ ] **DOC-02**: In-browser PDF preview (PDF.js); document bytes proxied through the app — no presigned URLs exposed to the browser (privacy model)
- [ ] **DOC-03**: AI provider and model assigned by admin per user; user cannot change AI configuration
- [ ] **DOC-04**: System default topics + per-user topic overrides preserved from existing implementation
- [ ] **DOC-05**: AI classification uses the user's assigned provider and model (from DB, not from user-supplied settings)
---
## v2 Requirements (Deferred)
- Subscription billing and payment processing (quota model designed to plug in)
- SSO: Microsoft, Google, Apple (auth layer designed for extension)
- Keycloak / SAML / OAuth2 enterprise federation
- Group admin roles (groups table seeded in schema, unpopulated)
- Share permission levels beyond view-only (edit, comment)
- Document version history
- Share expiry dates
- Real-time collaboration or comments
- Mobile app
- GDPR data export (Article 20) — async background job, deferred to v2
- Email notifications for sharing events
- Public link sharing (unauthenticated)
---
## Out of Scope
- Admin impersonation / "log in as user" — violates privacy-first core value; explicit architectural exclusion
- Document editing or annotation — not planned
- Document viewer for non-PDF types beyond metadata (DOCX, image renders) — v2
- AI-generated document summaries beyond topic classification — v2
- Webhooks or API access for third parties — not planned for v1
---
## Traceability
_Filled by roadmapper._
| REQ-ID | Phase | Notes |
|---|---|---|
| (pending) | | |