87a32b7ee8
Adds the unified file manager view (Windows Explorer-style), collapsible folder tree sidebar item, full vitest test suite (55 tests, 4 files), and commits all Phase 4 backend/frontend fixes that were staged but uncommitted. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
177 lines
10 KiB
Markdown
177 lines
10 KiB
Markdown
# DocuVault — Claude Code Guide
|
|
|
|
## Project Overview
|
|
|
|
DocuVault is a multi-user SaaS document management platform built on FastAPI (Python) + Vue 3. It handles document upload, text extraction (PDF/DOCX/image/text), AI-based topic classification, per-user isolated storage, folder organization, document sharing, and pluggable cloud storage backends (OneDrive, Google Drive, Nextcloud, WebDAV).
|
|
|
|
**Current state:** Brownfield — single-user app is functional. Active milestone: migrating to multi-user, adding auth, PostgreSQL + MinIO, and cloud storage.
|
|
|
|
## Stack
|
|
|
|
- **Backend:** Python 3.12, FastAPI 0.136+, SQLAlchemy 2.0 async, psycopg v3, Alembic, MinIO SDK
|
|
- **Frontend:** Vue 3 (Options API), Pinia, Vue Router 4, Vite, Tailwind CSS
|
|
- **Infrastructure:** Docker Compose, PostgreSQL, MinIO (S3-compatible)
|
|
- **Auth:** PyJWT 2.12+, pwdlib[argon2], pyotp (TOTP), cryptography (Fernet/HKDF)
|
|
|
|
## Key Architectural Rules
|
|
|
|
- JWT access token lives in **Pinia memory only** — never localStorage or sessionStorage
|
|
- Refresh token is an **httpOnly; Secure; SameSite=Strict cookie** — never accessible to JavaScript
|
|
- MinIO object keys are **UUID-based** (`{user_id}/{document_id}/{uuid4()}{ext}`) — human filenames in DB only
|
|
- Cloud credentials encrypted with **HKDF per-user key derivation** — master key in env var only
|
|
- Quota enforced atomically: **`UPDATE quotas SET used_bytes = used_bytes + $delta WHERE (used_bytes + $delta) <= limit_bytes RETURNING used_bytes`**
|
|
- Admin endpoints **never return** document content, extracted text, or `credentials_enc`
|
|
- Every document/folder endpoint asserts `resource.user_id == current_user.id`
|
|
- All DB queries via ORM / parameterized statements — zero raw string interpolation
|
|
|
|
## GSD Workflow
|
|
|
|
This project uses the GSD (Get Shit Done) planning workflow. Planning artifacts live in `.planning/`.
|
|
|
|
### Key files
|
|
|
|
| File | Purpose |
|
|
|---|---|
|
|
| `.planning/ROADMAP.md` | 5-phase plan with success criteria |
|
|
| `.planning/REQUIREMENTS.md` | 54 v1 requirements with REQ-IDs |
|
|
| `.planning/STATE.md` | Current phase and completion status |
|
|
| `.planning/PROJECT.md` | Project context and key decisions |
|
|
| `.planning/research/SUMMARY.md` | Domain research synthesis |
|
|
| `.planning/codebase/` | Codebase map (architecture, stack, concerns) |
|
|
|
|
### Commands
|
|
|
|
```
|
|
/gsd:discuss-phase N — gather context before planning a phase
|
|
/gsd:plan-phase N — create execution plan for a phase
|
|
/gsd:execute-phase N — execute the plan
|
|
/gsd:verify-work N — verify phase deliverables against requirements
|
|
/gsd:progress — check status and advance workflow
|
|
```
|
|
|
|
### Current phase: Not started — run `/gsd:discuss-phase 1` to begin
|
|
|
|
## Development Setup
|
|
|
|
```bash
|
|
# Start all services
|
|
docker compose up
|
|
|
|
# Backend only (local dev)
|
|
cd backend && uvicorn main:app --reload
|
|
|
|
# Frontend only (local dev)
|
|
cd frontend && npm run dev
|
|
|
|
# Run backend tests
|
|
cd backend && pytest -v
|
|
```
|
|
|
|
## Testing Protocol (Non-Negotiable)
|
|
|
|
Every feature, function, and bug fix requires tests. No phase or plan may advance until all tests pass.
|
|
|
|
### Rules
|
|
|
|
- **Coverage**: Every new function, endpoint, and UI component must have at least one test — unit for isolated logic, integration for DB/service boundaries, E2E for critical user flows
|
|
- **Gate**: `pytest -v` (backend) and frontend test suite must pass with zero failures before marking a plan complete or advancing to the next phase
|
|
- **Bug fixes**: Must fix the root cause, not work around it. Maximum 50 lines of changed code per fix. If a fix requires more, it is scope-creep and must be broken into a separate plan
|
|
- **No workarounds**: `# type: ignore`, `noqa`, skipping a test, or adding a `try/except` that silently swallows an error are prohibited as bug fixes
|
|
- **Regression**: Any time a bug is fixed, a test must be added that would have caught it
|
|
|
|
### Test types per layer
|
|
|
|
| Layer | Required test type |
|
|
|---|---|
|
|
| Service / business logic | Unit tests with mocked dependencies |
|
|
| DB queries / ORM | Integration tests against real PostgreSQL (not SQLite for quota/UUID tests) |
|
|
| API endpoints | `httpx.AsyncClient` integration tests with real DB fixtures |
|
|
| Auth flows | Full round-trip tests (register → login → TOTP → refresh → revoke) |
|
|
| Security invariants | Dedicated negative tests (wrong owner → 403/404, admin → 403, replay → 401) |
|
|
| Frontend | Vitest unit tests for stores/composables; Playwright or Cypress for critical flows |
|
|
|
|
---
|
|
|
|
## Security Protocol (Non-Negotiable)
|
|
|
|
A dedicated **security agent** runs after every plan execution and before any phase is marked complete. This agent has full read/write/edit access to the entire codebase and is the final gate before advancement.
|
|
|
|
### Security agent mandate
|
|
|
|
The security agent must check — and fix — every class of vulnerability listed below. It may not flag and defer; it must resolve or escalate blocking issues.
|
|
|
|
#### OWASP Top 10 + auth-specific
|
|
|
|
| Threat | Required mitigation |
|
|
|---|---|
|
|
| SQL injection | All queries via ORM or parameterized statements — zero raw string interpolation |
|
|
| XSS | CSP headers, `httpOnly` cookies, no `innerHTML` with user data, Vue template auto-escaping never bypassed |
|
|
| CSRF | `SameSite=Strict` cookie + `Origin`/`Referer` header validation on all state-changing endpoints |
|
|
| Broken auth | Short-lived JWT (≤15 min), refresh rotation, family revocation on reuse, constant-time comparison |
|
|
| IDOR / broken access control | Every resource endpoint asserts `resource.user_id == current_user.id`; admin blocked from document content |
|
|
| Security misconfiguration | No debug mode in production, no stack traces in API responses, no default credentials |
|
|
| Sensitive data exposure | Passwords hashed Argon2id, PII fields encrypted at rest, `credentials_enc` never in API responses |
|
|
| Insecure deserialization | No `pickle`, no `eval`, no dynamic `__import__`; all user-supplied data validated via Pydantic |
|
|
| Vulnerable dependencies | `pip audit` / `npm audit` run; critical/high CVEs blocked |
|
|
| Insufficient logging | All auth events, quota violations, and admin actions written to audit log without document content |
|
|
|
|
#### Advanced threats
|
|
|
|
- **Path traversal**: All file path construction uses `os.path.basename` / `pathlib` — never joins user-supplied strings directly
|
|
- **SSRF**: All outbound HTTP (HIBP, cloud OAuth) via an allowlisted client; user-supplied URLs for WebDAV/Nextcloud must pass hostname allowlist
|
|
- **Timing attacks**: `hmac.compare_digest` / `secrets.compare_digest` for all token, TOTP, and backup-code comparison — no `==`
|
|
- **Race conditions / TOCTOU**: Quota enforcement via single atomic `UPDATE … RETURNING` — never read-then-write in Python
|
|
- **Mass assignment**: Pydantic models explicitly declare every accepted field; no `**kwargs` passthrough from request body to ORM
|
|
- **Privilege escalation**: `get_regular_user` and `get_current_admin` deps checked on every endpoint; no role elevation path exists
|
|
- **Token replay**: JTI stored in DB; used TOTP codes invalidated within the 90 s window; refresh token family revocation on reuse
|
|
|
|
#### Zero-day / defense-in-depth
|
|
|
|
- **Minimal attack surface**: Every endpoint that is not needed is absent — no commented-out code, no `TODO: remove` endpoints left alive
|
|
- **Principle of least privilege**: `docuvault_app` DB role has DML only; `docuvault_migrate` has DDL; MinIO bucket policy denies public access
|
|
- **Secrets in env only**: No credentials, API keys, or signing secrets in code, commits, or `.env` files checked in; `.gitignore` enforces this
|
|
- **Dependency pinning**: `requirements.txt` and `package-lock.json` pin exact versions; no floating `>=` for security-critical packages (PyJWT, pwdlib, cryptography)
|
|
- **Container hardening**: Non-root user in Dockerfile, read-only filesystem where possible, no `--privileged` containers
|
|
- **Header hardening**: `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`, `Referrer-Policy: strict-origin-when-cross-origin` on every response
|
|
|
|
### Database user table encryption
|
|
|
|
Sensitive user PII (email, display name) must be encrypted at the application layer before storage:
|
|
|
|
- Encryption: AES-256-GCM via `cryptography` library, per-row nonce, master key from env var
|
|
- Key derivation: HKDF-SHA256 with `purpose=b"user-pii"` salt — same pattern as cloud credentials
|
|
- Admin queries: never return plaintext PII for users other than the requesting user
|
|
- Indexing: email lookup uses a deterministic HMAC-SHA256 index (`email_hmac` column) — the encrypted column is never used for WHERE clauses
|
|
|
|
### Login token hardening (state of the art)
|
|
|
|
- **Algorithm**: ES256 (ECDSA P-256) — asymmetric; the private key signs, the public key verifies; a leaked public key cannot forge tokens
|
|
- **Access token TTL**: 15 minutes maximum
|
|
- **Refresh token**: 30-day httpOnly Strict cookie; rotated on every use; reuse of a rotated token revokes entire family and fires a security alert email
|
|
- **JTI claim**: Every token has a unique `jti`; revoked JTIs stored in Redis with TTL matching the token lifetime
|
|
- **Token binding**: Access token embeds a `fgp` (fingerprint) claim = HMAC of `User-Agent + Accept-Language`; backend validates on every request
|
|
- **Rotation on privilege change**: Password change, TOTP enroll/revoke, and account deactivation immediately revoke all active sessions
|
|
|
|
### Security gate checklist (must all pass before phase advances)
|
|
|
|
- [ ] `bandit -r backend/` — zero HIGH severity findings
|
|
- [ ] `pip audit` — zero critical/high CVEs
|
|
- [ ] `npm audit --audit-level=high` — zero high/critical vulnerabilities
|
|
- [ ] All security-invariant tests pass (wrong owner, admin block, token replay, CSRF)
|
|
- [ ] No new `# noqa: S` suppressions without a documented justification comment
|
|
- [ ] Admin endpoints verified to never return `password_hash`, `credentials_enc`, or document content
|
|
- [ ] No hardcoded secrets detected by `git secrets` / `trufflehog`
|
|
|
|
---
|
|
|
|
## Security Requirements (Non-Negotiable)
|
|
|
|
- Rate limiting on all auth endpoints (login, register, password reset, TOTP)
|
|
- Constant-time comparison for all token/code verification
|
|
- CSRF protection on all state-changing endpoints
|
|
- Content-Security-Policy headers on all responses
|
|
- HaveIBeenPwned API check on registration and password change
|
|
- TOTP replay prevention (mark used codes in DB within validity window)
|
|
- Refresh token family revocation on token reuse detection
|
|
- Admin impersonation is an explicit architectural exclusion — no endpoint or code path may exist
|