Files
kite/CLAUDE.md
T
curo1305 87a32b7ee8 feat(phase-4): complete UX redesign — FileManagerView, FolderTreeItem, test suite, and all Phase 4 fixes
Adds the unified file manager view (Windows Explorer-style), collapsible
folder tree sidebar item, full vitest test suite (55 tests, 4 files), and
commits all Phase 4 backend/frontend fixes that were staged but uncommitted.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-28 17:10:52 +02:00

177 lines
10 KiB
Markdown

# DocuVault — Claude Code Guide
## Project Overview
DocuVault is a multi-user SaaS document management platform built on FastAPI (Python) + Vue 3. It handles document upload, text extraction (PDF/DOCX/image/text), AI-based topic classification, per-user isolated storage, folder organization, document sharing, and pluggable cloud storage backends (OneDrive, Google Drive, Nextcloud, WebDAV).
**Current state:** Brownfield — single-user app is functional. Active milestone: migrating to multi-user, adding auth, PostgreSQL + MinIO, and cloud storage.
## Stack
- **Backend:** Python 3.12, FastAPI 0.136+, SQLAlchemy 2.0 async, psycopg v3, Alembic, MinIO SDK
- **Frontend:** Vue 3 (Options API), Pinia, Vue Router 4, Vite, Tailwind CSS
- **Infrastructure:** Docker Compose, PostgreSQL, MinIO (S3-compatible)
- **Auth:** PyJWT 2.12+, pwdlib[argon2], pyotp (TOTP), cryptography (Fernet/HKDF)
## Key Architectural Rules
- JWT access token lives in **Pinia memory only** — never localStorage or sessionStorage
- Refresh token is an **httpOnly; Secure; SameSite=Strict cookie** — never accessible to JavaScript
- MinIO object keys are **UUID-based** (`{user_id}/{document_id}/{uuid4()}{ext}`) — human filenames in DB only
- Cloud credentials encrypted with **HKDF per-user key derivation** — master key in env var only
- Quota enforced atomically: **`UPDATE quotas SET used_bytes = used_bytes + $delta WHERE (used_bytes + $delta) <= limit_bytes RETURNING used_bytes`**
- Admin endpoints **never return** document content, extracted text, or `credentials_enc`
- Every document/folder endpoint asserts `resource.user_id == current_user.id`
- All DB queries via ORM / parameterized statements — zero raw string interpolation
## GSD Workflow
This project uses the GSD (Get Shit Done) planning workflow. Planning artifacts live in `.planning/`.
### Key files
| File | Purpose |
|---|---|
| `.planning/ROADMAP.md` | 5-phase plan with success criteria |
| `.planning/REQUIREMENTS.md` | 54 v1 requirements with REQ-IDs |
| `.planning/STATE.md` | Current phase and completion status |
| `.planning/PROJECT.md` | Project context and key decisions |
| `.planning/research/SUMMARY.md` | Domain research synthesis |
| `.planning/codebase/` | Codebase map (architecture, stack, concerns) |
### Commands
```
/gsd:discuss-phase N — gather context before planning a phase
/gsd:plan-phase N — create execution plan for a phase
/gsd:execute-phase N — execute the plan
/gsd:verify-work N — verify phase deliverables against requirements
/gsd:progress — check status and advance workflow
```
### Current phase: Not started — run `/gsd:discuss-phase 1` to begin
## Development Setup
```bash
# Start all services
docker compose up
# Backend only (local dev)
cd backend && uvicorn main:app --reload
# Frontend only (local dev)
cd frontend && npm run dev
# Run backend tests
cd backend && pytest -v
```
## Testing Protocol (Non-Negotiable)
Every feature, function, and bug fix requires tests. No phase or plan may advance until all tests pass.
### Rules
- **Coverage**: Every new function, endpoint, and UI component must have at least one test — unit for isolated logic, integration for DB/service boundaries, E2E for critical user flows
- **Gate**: `pytest -v` (backend) and frontend test suite must pass with zero failures before marking a plan complete or advancing to the next phase
- **Bug fixes**: Must fix the root cause, not work around it. Maximum 50 lines of changed code per fix. If a fix requires more, it is scope-creep and must be broken into a separate plan
- **No workarounds**: `# type: ignore`, `noqa`, skipping a test, or adding a `try/except` that silently swallows an error are prohibited as bug fixes
- **Regression**: Any time a bug is fixed, a test must be added that would have caught it
### Test types per layer
| Layer | Required test type |
|---|---|
| Service / business logic | Unit tests with mocked dependencies |
| DB queries / ORM | Integration tests against real PostgreSQL (not SQLite for quota/UUID tests) |
| API endpoints | `httpx.AsyncClient` integration tests with real DB fixtures |
| Auth flows | Full round-trip tests (register → login → TOTP → refresh → revoke) |
| Security invariants | Dedicated negative tests (wrong owner → 403/404, admin → 403, replay → 401) |
| Frontend | Vitest unit tests for stores/composables; Playwright or Cypress for critical flows |
---
## Security Protocol (Non-Negotiable)
A dedicated **security agent** runs after every plan execution and before any phase is marked complete. This agent has full read/write/edit access to the entire codebase and is the final gate before advancement.
### Security agent mandate
The security agent must check — and fix — every class of vulnerability listed below. It may not flag and defer; it must resolve or escalate blocking issues.
#### OWASP Top 10 + auth-specific
| Threat | Required mitigation |
|---|---|
| SQL injection | All queries via ORM or parameterized statements — zero raw string interpolation |
| XSS | CSP headers, `httpOnly` cookies, no `innerHTML` with user data, Vue template auto-escaping never bypassed |
| CSRF | `SameSite=Strict` cookie + `Origin`/`Referer` header validation on all state-changing endpoints |
| Broken auth | Short-lived JWT (≤15 min), refresh rotation, family revocation on reuse, constant-time comparison |
| IDOR / broken access control | Every resource endpoint asserts `resource.user_id == current_user.id`; admin blocked from document content |
| Security misconfiguration | No debug mode in production, no stack traces in API responses, no default credentials |
| Sensitive data exposure | Passwords hashed Argon2id, PII fields encrypted at rest, `credentials_enc` never in API responses |
| Insecure deserialization | No `pickle`, no `eval`, no dynamic `__import__`; all user-supplied data validated via Pydantic |
| Vulnerable dependencies | `pip audit` / `npm audit` run; critical/high CVEs blocked |
| Insufficient logging | All auth events, quota violations, and admin actions written to audit log without document content |
#### Advanced threats
- **Path traversal**: All file path construction uses `os.path.basename` / `pathlib` — never joins user-supplied strings directly
- **SSRF**: All outbound HTTP (HIBP, cloud OAuth) via an allowlisted client; user-supplied URLs for WebDAV/Nextcloud must pass hostname allowlist
- **Timing attacks**: `hmac.compare_digest` / `secrets.compare_digest` for all token, TOTP, and backup-code comparison — no `==`
- **Race conditions / TOCTOU**: Quota enforcement via single atomic `UPDATE … RETURNING` — never read-then-write in Python
- **Mass assignment**: Pydantic models explicitly declare every accepted field; no `**kwargs` passthrough from request body to ORM
- **Privilege escalation**: `get_regular_user` and `get_current_admin` deps checked on every endpoint; no role elevation path exists
- **Token replay**: JTI stored in DB; used TOTP codes invalidated within the 90 s window; refresh token family revocation on reuse
#### Zero-day / defense-in-depth
- **Minimal attack surface**: Every endpoint that is not needed is absent — no commented-out code, no `TODO: remove` endpoints left alive
- **Principle of least privilege**: `docuvault_app` DB role has DML only; `docuvault_migrate` has DDL; MinIO bucket policy denies public access
- **Secrets in env only**: No credentials, API keys, or signing secrets in code, commits, or `.env` files checked in; `.gitignore` enforces this
- **Dependency pinning**: `requirements.txt` and `package-lock.json` pin exact versions; no floating `>=` for security-critical packages (PyJWT, pwdlib, cryptography)
- **Container hardening**: Non-root user in Dockerfile, read-only filesystem where possible, no `--privileged` containers
- **Header hardening**: `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`, `Referrer-Policy: strict-origin-when-cross-origin` on every response
### Database user table encryption
Sensitive user PII (email, display name) must be encrypted at the application layer before storage:
- Encryption: AES-256-GCM via `cryptography` library, per-row nonce, master key from env var
- Key derivation: HKDF-SHA256 with `purpose=b"user-pii"` salt — same pattern as cloud credentials
- Admin queries: never return plaintext PII for users other than the requesting user
- Indexing: email lookup uses a deterministic HMAC-SHA256 index (`email_hmac` column) — the encrypted column is never used for WHERE clauses
### Login token hardening (state of the art)
- **Algorithm**: ES256 (ECDSA P-256) — asymmetric; the private key signs, the public key verifies; a leaked public key cannot forge tokens
- **Access token TTL**: 15 minutes maximum
- **Refresh token**: 30-day httpOnly Strict cookie; rotated on every use; reuse of a rotated token revokes entire family and fires a security alert email
- **JTI claim**: Every token has a unique `jti`; revoked JTIs stored in Redis with TTL matching the token lifetime
- **Token binding**: Access token embeds a `fgp` (fingerprint) claim = HMAC of `User-Agent + Accept-Language`; backend validates on every request
- **Rotation on privilege change**: Password change, TOTP enroll/revoke, and account deactivation immediately revoke all active sessions
### Security gate checklist (must all pass before phase advances)
- [ ] `bandit -r backend/` — zero HIGH severity findings
- [ ] `pip audit` — zero critical/high CVEs
- [ ] `npm audit --audit-level=high` — zero high/critical vulnerabilities
- [ ] All security-invariant tests pass (wrong owner, admin block, token replay, CSRF)
- [ ] No new `# noqa: S` suppressions without a documented justification comment
- [ ] Admin endpoints verified to never return `password_hash`, `credentials_enc`, or document content
- [ ] No hardcoded secrets detected by `git secrets` / `trufflehog`
---
## Security Requirements (Non-Negotiable)
- Rate limiting on all auth endpoints (login, register, password reset, TOTP)
- Constant-time comparison for all token/code verification
- CSRF protection on all state-changing endpoints
- Content-Security-Policy headers on all responses
- HaveIBeenPwned API check on registration and password change
- TOTP replay prevention (mark used codes in DB within validity window)
- Refresh token family revocation on token reuse detection
- Admin impersonation is an explicit architectural exclusion — no endpoint or code path may exist