Files
2026-05-21 18:58:15 +02:00

7.3 KiB

DocuVault

What This Is

DocuVault is a self-hosted, multi-user SaaS document management platform. Users upload documents (PDF, DOCX, images, text), which are automatically text-extracted and classified by AI into user-defined topics. Each user has isolated, quota-enforced storage, can organize documents in folders, connect external cloud storage backends (OneDrive, Google Drive, Nextcloud, etc.), and share documents with other users by handle. A privacy-first admin model gives administrators platform control without any access to user document content.

Core Value

Every user's documents — and the credentials they use to store them — are inaccessible to everyone except that user, while the platform scales horizontally and supports pluggable storage backends.

Requirements

Validated

Capabilities already shipping in the codebase:

  • ✓ Document upload and text extraction (PDF, DOCX, image, plain text) — existing
  • ✓ AI-based topic classification via configurable provider — existing
  • ✓ Multiple AI provider support (Anthropic, OpenAI, Ollama, LMStudio) — existing
  • ✓ Topic CRUD management — existing
  • ✓ System prompt configuration — existing
  • ✓ Docker containerization (Compose) — existing

Active

Users & Auth

  • User can register with email and password (enforced strength: length, complexity, breach check)
  • User can log in and maintain session via JWT
  • User can enable TOTP authenticator app for 2FA
  • Admin can create, deactivate, and reset passwords for user accounts
  • Admin cannot access any user's documents or cloud storage credentials

Storage & Quotas

  • Each user has an isolated storage area with a 100 MB free-tier quota
  • Quota usage is tracked and enforced; uploads exceeding quota are rejected with a clear error
  • Admin can adjust individual user storage quotas
  • Platform migrates from flat-file JSON + filesystem to PostgreSQL + MinIO (S3-compatible)

Folder Structure

  • User can create, rename, and delete folders to organize documents
  • Document organization is preserved on move/rename (no auto-rearrangement by AI)
  • A "Shared with me" folder appears automatically when another user shares a document

Document Sharing

  • User can share a document (or folder) with another user by their unique handle
  • Shared access is view-only by default; owner controls permission level
  • Revoking share removes access immediately; shared copy is not duplicated in recipient's quota

Cloud Storage Integration

  • User can connect an external cloud storage backend (OneDrive, Google Drive, Nextcloud; extensible)
  • Local storage and cloud storage coexist; user selects their default storage destination
  • Cloud storage credentials are encrypted at rest and never readable by admins
  • Documents stored in cloud backend are accessed via the app without being re-copied to local storage

AI Configuration (Admin-controlled)

  • Admin can assign an AI provider and model per user or per group
  • System-wide default AI provider and model set by admin
  • Users cannot change their own AI provider or model
  • Per-user topic overrides on top of system default topics

Audit Logging

  • Audit log captures: logins, failed logins, uploads, deletes, sharing events, quota changes
  • Audit log records metadata only — no document content
  • Admin can view and filter audit logs

Scalability

  • Backend stateless — multiple instances can run behind a load balancer
  • All state in PostgreSQL and MinIO (no local file locks, no per-instance JSON)

Out of Scope

  • Subscription billing / payment processing — future milestone (quotas designed to plug in)
  • SSO (Microsoft, Google, Apple) — future; auth layer designed for extension
  • Keycloak / SAML / OAuth enterprise federation — future
  • Group admin roles — future; groups table will be seeded in schema
  • Document annotation or in-app editing — not planned
  • Mobile app — not planned
  • Public document sharing (unauthenticated link) — not planned for v1

Context

  • Existing codebase: Functional single-user document scanner (FastAPI + Vue 3, Docker Compose). AI provider abstraction already in place — cloud storage will follow the same adapter pattern.
  • Brownfield migration: Flat-file JSON persistence and per-process file locks must be replaced with PostgreSQL + MinIO before multi-user isolation is safe.
  • Privacy constraint: SaaS model with strict admin/user data separation. Admin role is a platform operator, not a content viewer. Cloud credentials must be encrypted server-side; the encryption key must not be readable by admin queries.
  • Free tier baseline: 100 MB per user. Quota model should be designed so future subscription tiers can expand it without schema changes.
  • Cloud storage: Follows same provider/adapter pattern as existing AI providers. Each cloud integration is an adapter implementing a common StorageBackend interface.

Constraints

  • Tech stack: FastAPI (Python) + Vue 3 — keep existing stack, extend it
  • Database: PostgreSQL (replaces flat-file JSON)
  • Object storage: MinIO (S3-compatible, Docker-native) — replaces local filesystem for documents
  • Auth: bcrypt passwords, JWT sessions, TOTP 2FA (PyOTP / similar)
  • Cloud credentials: Encrypted at rest (Fernet symmetric encryption or PostgreSQL pgcrypto) — key in env var, never in DB
  • Scalability target: Horizontal (multiple backend containers) — no file-system-level coordination
  • Deployment: Docker Compose (must remain the primary deployment target)

Key Decisions

Decision Rationale Outcome
PostgreSQL + MinIO over flat files Multi-user quotas + horizontal scaling require shared, consistent state Replacing JSON + filesystem
Cloud storage adapter pattern Mirrors existing AI provider pattern — consistent, extensible New storage/ module analogous to ai/
Privacy-first admin model SaaS legal/trust requirement — admins must not be able to access user data Admin queries exclude document content; cloud creds encrypted with user-scoped key
Admin controls AI config, not users Prevents cost overruns and model misuse; future group-admin delegation designed in AI provider assignment stored per-user in DB, configurable by admin
100 MB free tier Baseline for subscription model; quota table has a limit_bytes column admin can override Quota enforced at upload time
TOTP 2FA before SSO State-of-the-art security without third-party dependency; SSO added when subscription model lands TOTP via authenticator app (RFC 6238)

Evolution

This document evolves at phase transitions and milestone boundaries.

After each phase transition (via /gsd-transition):

  1. Requirements invalidated? → Move to Out of Scope with reason
  2. Requirements validated? → Move to Validated with phase reference
  3. New requirements emerged? → Add to Active
  4. Decisions to log? → Add to Key Decisions
  5. "What This Is" still accurate? → Update if drifted

After each milestone (via /gsd:complete-milestone):

  1. Full review of all sections
  2. Core Value check — still the right priority?
  3. Audit Out of Scope — reasons still valid?
  4. Update Context with current state

Last updated: 2026-05-21 after initialization