commit 7d96bcc805e8e27545d5b6db3fd9a98dceb27a87 Author: curo1305 Date: Thu May 21 18:58:15 2026 +0200 docs: initialize project diff --git a/.planning/PROJECT.md b/.planning/PROJECT.md new file mode 100644 index 0000000..42bc449 --- /dev/null +++ b/.planning/PROJECT.md @@ -0,0 +1,127 @@ +# DocuVault + +## What This Is + +DocuVault is a self-hosted, multi-user SaaS document management platform. Users upload documents (PDF, DOCX, images, text), which are automatically text-extracted and classified by AI into user-defined topics. Each user has isolated, quota-enforced storage, can organize documents in folders, connect external cloud storage backends (OneDrive, Google Drive, Nextcloud, etc.), and share documents with other users by handle. A privacy-first admin model gives administrators platform control without any access to user document content. + +## Core Value + +Every user's documents — and the credentials they use to store them — are inaccessible to everyone except that user, while the platform scales horizontally and supports pluggable storage backends. + +## Requirements + +### Validated + +*Capabilities already shipping in the codebase:* + +- ✓ Document upload and text extraction (PDF, DOCX, image, plain text) — existing +- ✓ AI-based topic classification via configurable provider — existing +- ✓ Multiple AI provider support (Anthropic, OpenAI, Ollama, LMStudio) — existing +- ✓ Topic CRUD management — existing +- ✓ System prompt configuration — existing +- ✓ Docker containerization (Compose) — existing + +### Active + +**Users & Auth** +- [ ] User can register with email and password (enforced strength: length, complexity, breach check) +- [ ] User can log in and maintain session via JWT +- [ ] User can enable TOTP authenticator app for 2FA +- [ ] Admin can create, deactivate, and reset passwords for user accounts +- [ ] Admin cannot access any user's documents or cloud storage credentials + +**Storage & Quotas** +- [ ] Each user has an isolated storage area with a 100 MB free-tier quota +- [ ] Quota usage is tracked and enforced; uploads exceeding quota are rejected with a clear error +- [ ] Admin can adjust individual user storage quotas +- [ ] Platform migrates from flat-file JSON + filesystem to PostgreSQL + MinIO (S3-compatible) + +**Folder Structure** +- [ ] User can create, rename, and delete folders to organize documents +- [ ] Document organization is preserved on move/rename (no auto-rearrangement by AI) +- [ ] A "Shared with me" folder appears automatically when another user shares a document + +**Document Sharing** +- [ ] User can share a document (or folder) with another user by their unique handle +- [ ] Shared access is view-only by default; owner controls permission level +- [ ] Revoking share removes access immediately; shared copy is not duplicated in recipient's quota + +**Cloud Storage Integration** +- [ ] User can connect an external cloud storage backend (OneDrive, Google Drive, Nextcloud; extensible) +- [ ] Local storage and cloud storage coexist; user selects their default storage destination +- [ ] Cloud storage credentials are encrypted at rest and never readable by admins +- [ ] Documents stored in cloud backend are accessed via the app without being re-copied to local storage + +**AI Configuration (Admin-controlled)** +- [ ] Admin can assign an AI provider and model per user or per group +- [ ] System-wide default AI provider and model set by admin +- [ ] Users cannot change their own AI provider or model +- [ ] Per-user topic overrides on top of system default topics + +**Audit Logging** +- [ ] Audit log captures: logins, failed logins, uploads, deletes, sharing events, quota changes +- [ ] Audit log records metadata only — no document content +- [ ] Admin can view and filter audit logs + +**Scalability** +- [ ] Backend stateless — multiple instances can run behind a load balancer +- [ ] All state in PostgreSQL and MinIO (no local file locks, no per-instance JSON) + +### Out of Scope + +- Subscription billing / payment processing — future milestone (quotas designed to plug in) +- SSO (Microsoft, Google, Apple) — future; auth layer designed for extension +- Keycloak / SAML / OAuth enterprise federation — future +- Group admin roles — future; groups table will be seeded in schema +- Document annotation or in-app editing — not planned +- Mobile app — not planned +- Public document sharing (unauthenticated link) — not planned for v1 + +## Context + +- **Existing codebase**: Functional single-user document scanner (FastAPI + Vue 3, Docker Compose). AI provider abstraction already in place — cloud storage will follow the same adapter pattern. +- **Brownfield migration**: Flat-file JSON persistence and per-process file locks must be replaced with PostgreSQL + MinIO before multi-user isolation is safe. +- **Privacy constraint**: SaaS model with strict admin/user data separation. Admin role is a platform operator, not a content viewer. Cloud credentials must be encrypted server-side; the encryption key must not be readable by admin queries. +- **Free tier baseline**: 100 MB per user. Quota model should be designed so future subscription tiers can expand it without schema changes. +- **Cloud storage**: Follows same provider/adapter pattern as existing AI providers. Each cloud integration is an adapter implementing a common StorageBackend interface. + +## Constraints + +- **Tech stack**: FastAPI (Python) + Vue 3 — keep existing stack, extend it +- **Database**: PostgreSQL (replaces flat-file JSON) +- **Object storage**: MinIO (S3-compatible, Docker-native) — replaces local filesystem for documents +- **Auth**: bcrypt passwords, JWT sessions, TOTP 2FA (PyOTP / similar) +- **Cloud credentials**: Encrypted at rest (Fernet symmetric encryption or PostgreSQL pgcrypto) — key in env var, never in DB +- **Scalability target**: Horizontal (multiple backend containers) — no file-system-level coordination +- **Deployment**: Docker Compose (must remain the primary deployment target) + +## Key Decisions + +| Decision | Rationale | Outcome | +|---|---|---| +| PostgreSQL + MinIO over flat files | Multi-user quotas + horizontal scaling require shared, consistent state | Replacing JSON + filesystem | +| Cloud storage adapter pattern | Mirrors existing AI provider pattern — consistent, extensible | New `storage/` module analogous to `ai/` | +| Privacy-first admin model | SaaS legal/trust requirement — admins must not be able to access user data | Admin queries exclude document content; cloud creds encrypted with user-scoped key | +| Admin controls AI config, not users | Prevents cost overruns and model misuse; future group-admin delegation designed in | AI provider assignment stored per-user in DB, configurable by admin | +| 100 MB free tier | Baseline for subscription model; quota table has a `limit_bytes` column admin can override | Quota enforced at upload time | +| TOTP 2FA before SSO | State-of-the-art security without third-party dependency; SSO added when subscription model lands | TOTP via authenticator app (RFC 6238) | + +## Evolution + +This document evolves at phase transitions and milestone boundaries. + +**After each phase transition** (via `/gsd-transition`): +1. Requirements invalidated? → Move to Out of Scope with reason +2. Requirements validated? → Move to Validated with phase reference +3. New requirements emerged? → Add to Active +4. Decisions to log? → Add to Key Decisions +5. "What This Is" still accurate? → Update if drifted + +**After each milestone** (via `/gsd:complete-milestone`): +1. Full review of all sections +2. Core Value check — still the right priority? +3. Audit Out of Scope — reasons still valid? +4. Update Context with current state + +--- +*Last updated: 2026-05-21 after initialization*