Adds the unified file manager view (Windows Explorer-style), collapsible folder tree sidebar item, full vitest test suite (55 tests, 4 files), and commits all Phase 4 backend/frontend fixes that were staged but uncommitted. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
10 KiB
DocuVault — Claude Code Guide
Project Overview
DocuVault is a multi-user SaaS document management platform built on FastAPI (Python) + Vue 3. It handles document upload, text extraction (PDF/DOCX/image/text), AI-based topic classification, per-user isolated storage, folder organization, document sharing, and pluggable cloud storage backends (OneDrive, Google Drive, Nextcloud, WebDAV).
Current state: Brownfield — single-user app is functional. Active milestone: migrating to multi-user, adding auth, PostgreSQL + MinIO, and cloud storage.
Stack
- Backend: Python 3.12, FastAPI 0.136+, SQLAlchemy 2.0 async, psycopg v3, Alembic, MinIO SDK
- Frontend: Vue 3 (Options API), Pinia, Vue Router 4, Vite, Tailwind CSS
- Infrastructure: Docker Compose, PostgreSQL, MinIO (S3-compatible)
- Auth: PyJWT 2.12+, pwdlib[argon2], pyotp (TOTP), cryptography (Fernet/HKDF)
Key Architectural Rules
- JWT access token lives in Pinia memory only — never localStorage or sessionStorage
- Refresh token is an httpOnly; Secure; SameSite=Strict cookie — never accessible to JavaScript
- MinIO object keys are UUID-based (
{user_id}/{document_id}/{uuid4()}{ext}) — human filenames in DB only - Cloud credentials encrypted with HKDF per-user key derivation — master key in env var only
- Quota enforced atomically:
UPDATE quotas SET used_bytes = used_bytes + $delta WHERE (used_bytes + $delta) <= limit_bytes RETURNING used_bytes - Admin endpoints never return document content, extracted text, or
credentials_enc - Every document/folder endpoint asserts
resource.user_id == current_user.id - All DB queries via ORM / parameterized statements — zero raw string interpolation
GSD Workflow
This project uses the GSD (Get Shit Done) planning workflow. Planning artifacts live in .planning/.
Key files
| File | Purpose |
|---|---|
.planning/ROADMAP.md |
5-phase plan with success criteria |
.planning/REQUIREMENTS.md |
54 v1 requirements with REQ-IDs |
.planning/STATE.md |
Current phase and completion status |
.planning/PROJECT.md |
Project context and key decisions |
.planning/research/SUMMARY.md |
Domain research synthesis |
.planning/codebase/ |
Codebase map (architecture, stack, concerns) |
Commands
/gsd:discuss-phase N — gather context before planning a phase
/gsd:plan-phase N — create execution plan for a phase
/gsd:execute-phase N — execute the plan
/gsd:verify-work N — verify phase deliverables against requirements
/gsd:progress — check status and advance workflow
Current phase: Not started — run /gsd:discuss-phase 1 to begin
Development Setup
# Start all services
docker compose up
# Backend only (local dev)
cd backend && uvicorn main:app --reload
# Frontend only (local dev)
cd frontend && npm run dev
# Run backend tests
cd backend && pytest -v
Testing Protocol (Non-Negotiable)
Every feature, function, and bug fix requires tests. No phase or plan may advance until all tests pass.
Rules
- Coverage: Every new function, endpoint, and UI component must have at least one test — unit for isolated logic, integration for DB/service boundaries, E2E for critical user flows
- Gate:
pytest -v(backend) and frontend test suite must pass with zero failures before marking a plan complete or advancing to the next phase - Bug fixes: Must fix the root cause, not work around it. Maximum 50 lines of changed code per fix. If a fix requires more, it is scope-creep and must be broken into a separate plan
- No workarounds:
# type: ignore,noqa, skipping a test, or adding atry/exceptthat silently swallows an error are prohibited as bug fixes - Regression: Any time a bug is fixed, a test must be added that would have caught it
Test types per layer
| Layer | Required test type |
|---|---|
| Service / business logic | Unit tests with mocked dependencies |
| DB queries / ORM | Integration tests against real PostgreSQL (not SQLite for quota/UUID tests) |
| API endpoints | httpx.AsyncClient integration tests with real DB fixtures |
| Auth flows | Full round-trip tests (register → login → TOTP → refresh → revoke) |
| Security invariants | Dedicated negative tests (wrong owner → 403/404, admin → 403, replay → 401) |
| Frontend | Vitest unit tests for stores/composables; Playwright or Cypress for critical flows |
Security Protocol (Non-Negotiable)
A dedicated security agent runs after every plan execution and before any phase is marked complete. This agent has full read/write/edit access to the entire codebase and is the final gate before advancement.
Security agent mandate
The security agent must check — and fix — every class of vulnerability listed below. It may not flag and defer; it must resolve or escalate blocking issues.
OWASP Top 10 + auth-specific
| Threat | Required mitigation |
|---|---|
| SQL injection | All queries via ORM or parameterized statements — zero raw string interpolation |
| XSS | CSP headers, httpOnly cookies, no innerHTML with user data, Vue template auto-escaping never bypassed |
| CSRF | SameSite=Strict cookie + Origin/Referer header validation on all state-changing endpoints |
| Broken auth | Short-lived JWT (≤15 min), refresh rotation, family revocation on reuse, constant-time comparison |
| IDOR / broken access control | Every resource endpoint asserts resource.user_id == current_user.id; admin blocked from document content |
| Security misconfiguration | No debug mode in production, no stack traces in API responses, no default credentials |
| Sensitive data exposure | Passwords hashed Argon2id, PII fields encrypted at rest, credentials_enc never in API responses |
| Insecure deserialization | No pickle, no eval, no dynamic __import__; all user-supplied data validated via Pydantic |
| Vulnerable dependencies | pip audit / npm audit run; critical/high CVEs blocked |
| Insufficient logging | All auth events, quota violations, and admin actions written to audit log without document content |
Advanced threats
- Path traversal: All file path construction uses
os.path.basename/pathlib— never joins user-supplied strings directly - SSRF: All outbound HTTP (HIBP, cloud OAuth) via an allowlisted client; user-supplied URLs for WebDAV/Nextcloud must pass hostname allowlist
- Timing attacks:
hmac.compare_digest/secrets.compare_digestfor all token, TOTP, and backup-code comparison — no== - Race conditions / TOCTOU: Quota enforcement via single atomic
UPDATE … RETURNING— never read-then-write in Python - Mass assignment: Pydantic models explicitly declare every accepted field; no
**kwargspassthrough from request body to ORM - Privilege escalation:
get_regular_userandget_current_admindeps checked on every endpoint; no role elevation path exists - Token replay: JTI stored in DB; used TOTP codes invalidated within the 90 s window; refresh token family revocation on reuse
Zero-day / defense-in-depth
- Minimal attack surface: Every endpoint that is not needed is absent — no commented-out code, no
TODO: removeendpoints left alive - Principle of least privilege:
docuvault_appDB role has DML only;docuvault_migratehas DDL; MinIO bucket policy denies public access - Secrets in env only: No credentials, API keys, or signing secrets in code, commits, or
.envfiles checked in;.gitignoreenforces this - Dependency pinning:
requirements.txtandpackage-lock.jsonpin exact versions; no floating>=for security-critical packages (PyJWT, pwdlib, cryptography) - Container hardening: Non-root user in Dockerfile, read-only filesystem where possible, no
--privilegedcontainers - Header hardening:
X-Content-Type-Options: nosniff,X-Frame-Options: DENY,Referrer-Policy: strict-origin-when-cross-originon every response
Database user table encryption
Sensitive user PII (email, display name) must be encrypted at the application layer before storage:
- Encryption: AES-256-GCM via
cryptographylibrary, per-row nonce, master key from env var - Key derivation: HKDF-SHA256 with
purpose=b"user-pii"salt — same pattern as cloud credentials - Admin queries: never return plaintext PII for users other than the requesting user
- Indexing: email lookup uses a deterministic HMAC-SHA256 index (
email_hmaccolumn) — the encrypted column is never used for WHERE clauses
Login token hardening (state of the art)
- Algorithm: ES256 (ECDSA P-256) — asymmetric; the private key signs, the public key verifies; a leaked public key cannot forge tokens
- Access token TTL: 15 minutes maximum
- Refresh token: 30-day httpOnly Strict cookie; rotated on every use; reuse of a rotated token revokes entire family and fires a security alert email
- JTI claim: Every token has a unique
jti; revoked JTIs stored in Redis with TTL matching the token lifetime - Token binding: Access token embeds a
fgp(fingerprint) claim = HMAC ofUser-Agent + Accept-Language; backend validates on every request - Rotation on privilege change: Password change, TOTP enroll/revoke, and account deactivation immediately revoke all active sessions
Security gate checklist (must all pass before phase advances)
bandit -r backend/— zero HIGH severity findingspip audit— zero critical/high CVEsnpm audit --audit-level=high— zero high/critical vulnerabilities- All security-invariant tests pass (wrong owner, admin block, token replay, CSRF)
- No new
# noqa: Ssuppressions without a documented justification comment - Admin endpoints verified to never return
password_hash,credentials_enc, or document content - No hardcoded secrets detected by
git secrets/trufflehog
Security Requirements (Non-Negotiable)
- Rate limiting on all auth endpoints (login, register, password reset, TOTP)
- Constant-time comparison for all token/code verification
- CSRF protection on all state-changing endpoints
- Content-Security-Policy headers on all responses
- HaveIBeenPwned API check on registration and password change
- TOTP replay prevention (mark used codes in DB within validity window)
- Refresh token family revocation on token reuse detection
- Admin impersonation is an explicit architectural exclusion — no endpoint or code path may exist