Files
kite/CLAUDE.md
T
curo1305 eaa3399ec0 docs: add shared module map to CLAUDE.md, SECURITY.md, planning artifacts
- CLAUDE.md: add Code Standards section with backend and frontend shared
  module maps, component architecture rules, duplication checklist, and
  no-dead-code enforcement rule
- SECURITY.md: Phase 02 + 03 security audit results (all threats CLOSED)
- .planning: update milestone audit, config, and add plan/UAT files for
  phases 01, 02-06, and 06.2-05

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-02 16:10:59 +02:00

14 KiB

DocuVault — Claude Code Guide

Project Overview

DocuVault is a multi-user SaaS document management platform built on FastAPI (Python) + Vue 3. It handles document upload, text extraction (PDF/DOCX/image/text), AI-based topic classification, per-user isolated storage, folder organization, document sharing, and pluggable cloud storage backends (OneDrive, Google Drive, Nextcloud, WebDAV).

Current state: Brownfield — single-user app is functional. Active milestone: migrating to multi-user, adding auth, PostgreSQL + MinIO, and cloud storage.

Stack

  • Backend: Python 3.12, FastAPI 0.136+, SQLAlchemy 2.0 async, psycopg v3, Alembic, MinIO SDK
  • Frontend: Vue 3 (Options API), Pinia, Vue Router 4, Vite, Tailwind CSS
  • Infrastructure: Docker Compose, PostgreSQL, MinIO (S3-compatible)
  • Auth: PyJWT 2.12+, pwdlib[argon2], pyotp (TOTP), cryptography (Fernet/HKDF)

Key Architectural Rules

  • JWT access token lives in Pinia memory only — never localStorage or sessionStorage
  • Refresh token is an httpOnly; Secure; SameSite=Strict cookie — never accessible to JavaScript
  • MinIO object keys are UUID-based ({user_id}/{document_id}/{uuid4()}{ext}) — human filenames in DB only
  • Cloud credentials encrypted with HKDF per-user key derivation — master key in env var only
  • Quota enforced atomically: UPDATE quotas SET used_bytes = used_bytes + $delta WHERE (used_bytes + $delta) <= limit_bytes RETURNING used_bytes
  • Admin endpoints never return document content, extracted text, or credentials_enc
  • Every document/folder endpoint asserts resource.user_id == current_user.id
  • All DB queries via ORM / parameterized statements — zero raw string interpolation

Code Standards (Non-Negotiable)

Core principle

Things that look the same to the user are the same in code. Local file navigation and cloud file navigation share one component. Sidebar folder trees and cloud trees share one component. Format helpers exist once. If you are about to write the same logic a second time, extract it first.

Backend: shared module map

Before adding a helper, check if it belongs in an existing shared module:

Module What lives here
backend/deps/utils.py get_client_ip(request), parse_uuid(value) — request-parsing helpers used across all routers
backend/storage/exceptions.py CloudConnectionError — single canonical definition; all files import from here
backend/ai/utils.py strip_code_fences, parse_classification, parse_suggestions — AI response parsing shared by all providers
backend/services/auth.py validate_password_strength(password) — raises ValueError; routers catch and re-raise as HTTPException

Rules:

  • No router may define _ip(), _get_ip(), or any other local variant of get_client_ip. Import from deps.utils.
  • No router may define its own CloudConnectionError. Import from storage.exceptions.
  • No AI provider may define its own _strip_code_fences or _parse_*. Import from ai.utils.
  • No API file may define _validate_password_strength. Import from services.auth.
  • Service layer raises ValueError (or domain exceptions), never HTTPException. Only the router layer raises HTTPException.

Frontend: shared module map

Module What lives here
src/utils/formatters.js formatDate, formatSize, providerColor, providerBg, providerLabel
src/components/ui/TreeItem.vue Generic expand/collapse tree node — all sidebar tree items wrap this
src/components/storage/StorageBrowser.vue Unified file browser grid — used by both FileManagerView and CloudFolderView

Rules:

  • No component may define its own formatDate or formatSize. Always import from utils/formatters.js.
  • No component may define its own providerColor or providerBg. Always import from utils/formatters.js.
  • No new tree sidebar component may implement its own expand/collapse state. It must wrap TreeItem.vue.
  • StorageBrowser.vue is the single file browser. Do not create a parallel file grid anywhere.
  • FileManagerView and CloudFolderView are thin data-providers: they feed props into StorageBrowser and handle emitted events. They contain no layout or grid logic of their own.

Component architecture

View (thin data-provider)
  └── Smart component (StorageBrowser, AdminUsersTab, etc.)
        └── Dumb/presentational components (DocumentCard, FolderTreeItem, etc.)
  • Views own stores and route params. They pass data down as props and handle emitted events.
  • Smart components own layout, interactions, and internal state. They emit events upward; they do not call stores directly (exception: read-only lookups like topic color).
  • Presentational components receive everything as props and emit actions.
  • Props that are passed from parent to child are never mutated with v-model — use :model-value + @update:modelValue and emit upward.

No dead code

  • Files with no active route and no active import are deleted immediately — not commented out, not kept "just in case".
  • HomeView.vue and FolderView.vue are deleted. Do not recreate them.
  • Any file that becomes unreferenced after a refactor must be deleted in the same commit.

Duplication checklist (run before writing new code)

  1. Does a shared utility already exist for this logic? (Check the module map above.)
  2. Does this component already exist? (Search components/ before creating.)
  3. Is this logic already in a Pinia store? (Check stores/ before duplicating in a view.)
  4. If none of the above: create the shared module first, then use it everywhere that needs it.

GSD Workflow

This project uses the GSD (Get Shit Done) planning workflow. Planning artifacts live in .planning/.

Key files

File Purpose
.planning/ROADMAP.md 5-phase plan with success criteria
.planning/REQUIREMENTS.md 54 v1 requirements with REQ-IDs
.planning/STATE.md Current phase and completion status
.planning/PROJECT.md Project context and key decisions
.planning/research/SUMMARY.md Domain research synthesis
.planning/codebase/ Codebase map (architecture, stack, concerns)

Commands

/gsd:discuss-phase N   — gather context before planning a phase
/gsd:plan-phase N      — create execution plan for a phase
/gsd:execute-phase N   — execute the plan
/gsd:verify-work N     — verify phase deliverables against requirements
/gsd:progress          — check status and advance workflow

Current phase: Not started — run /gsd:discuss-phase 1 to begin

Development Setup

# Start all services
docker compose up

# Backend only (local dev)
cd backend && uvicorn main:app --reload

# Frontend only (local dev)
cd frontend && npm run dev

# Run backend tests
cd backend && pytest -v

Testing Protocol (Non-Negotiable)

Every feature, function, and bug fix requires tests. No phase or plan may advance until all tests pass.

Rules

  • Coverage: Every new function, endpoint, and UI component must have at least one test — unit for isolated logic, integration for DB/service boundaries, E2E for critical user flows
  • Gate: pytest -v (backend) and frontend test suite must pass with zero failures before marking a plan complete or advancing to the next phase
  • Bug fixes: Must fix the root cause, not work around it. Maximum 50 lines of changed code per fix. If a fix requires more, it is scope-creep and must be broken into a separate plan
  • No workarounds: # type: ignore, noqa, skipping a test, or adding a try/except that silently swallows an error are prohibited as bug fixes
  • Regression: Any time a bug is fixed, a test must be added that would have caught it

Test types per layer

Layer Required test type
Service / business logic Unit tests with mocked dependencies
DB queries / ORM Integration tests against real PostgreSQL (not SQLite for quota/UUID tests)
API endpoints httpx.AsyncClient integration tests with real DB fixtures
Auth flows Full round-trip tests (register → login → TOTP → refresh → revoke)
Security invariants Dedicated negative tests (wrong owner → 403/404, admin → 403, replay → 401)
Frontend Vitest unit tests for stores/composables; Playwright or Cypress for critical flows

Security Protocol (Non-Negotiable)

A dedicated security agent runs after every plan execution and before any phase is marked complete. This agent has full read/write/edit access to the entire codebase and is the final gate before advancement.

Security agent mandate

The security agent must check — and fix — every class of vulnerability listed below. It may not flag and defer; it must resolve or escalate blocking issues.

OWASP Top 10 + auth-specific

Threat Required mitigation
SQL injection All queries via ORM or parameterized statements — zero raw string interpolation
XSS CSP headers, httpOnly cookies, no innerHTML with user data, Vue template auto-escaping never bypassed
CSRF SameSite=Strict cookie + Origin/Referer header validation on all state-changing endpoints
Broken auth Short-lived JWT (≤15 min), refresh rotation, family revocation on reuse, constant-time comparison
IDOR / broken access control Every resource endpoint asserts resource.user_id == current_user.id; admin blocked from document content
Security misconfiguration No debug mode in production, no stack traces in API responses, no default credentials
Sensitive data exposure Passwords hashed Argon2id, PII fields encrypted at rest, credentials_enc never in API responses
Insecure deserialization No pickle, no eval, no dynamic __import__; all user-supplied data validated via Pydantic
Vulnerable dependencies pip audit / npm audit run; critical/high CVEs blocked
Insufficient logging All auth events, quota violations, and admin actions written to audit log without document content

Advanced threats

  • Path traversal: All file path construction uses os.path.basename / pathlib — never joins user-supplied strings directly
  • SSRF: All outbound HTTP (HIBP, cloud OAuth) via an allowlisted client; user-supplied URLs for WebDAV/Nextcloud must pass hostname allowlist
  • Timing attacks: hmac.compare_digest / secrets.compare_digest for all token, TOTP, and backup-code comparison — no ==
  • Race conditions / TOCTOU: Quota enforcement via single atomic UPDATE … RETURNING — never read-then-write in Python
  • Mass assignment: Pydantic models explicitly declare every accepted field; no **kwargs passthrough from request body to ORM
  • Privilege escalation: get_regular_user and get_current_admin deps checked on every endpoint; no role elevation path exists
  • Token replay: JTI stored in DB; used TOTP codes invalidated within the 90 s window; refresh token family revocation on reuse

Zero-day / defense-in-depth

  • Minimal attack surface: Every endpoint that is not needed is absent — no commented-out code, no TODO: remove endpoints left alive
  • Principle of least privilege: docuvault_app DB role has DML only; docuvault_migrate has DDL; MinIO bucket policy denies public access
  • Secrets in env only: No credentials, API keys, or signing secrets in code, commits, or .env files checked in; .gitignore enforces this
  • Dependency pinning: requirements.txt and package-lock.json pin exact versions; no floating >= for security-critical packages (PyJWT, pwdlib, cryptography)
  • Container hardening: Non-root user in Dockerfile, read-only filesystem where possible, no --privileged containers
  • Header hardening: X-Content-Type-Options: nosniff, X-Frame-Options: DENY, Referrer-Policy: strict-origin-when-cross-origin on every response

Database user table encryption

Sensitive user PII (email, display name) must be encrypted at the application layer before storage:

  • Encryption: AES-256-GCM via cryptography library, per-row nonce, master key from env var
  • Key derivation: HKDF-SHA256 with purpose=b"user-pii" salt — same pattern as cloud credentials
  • Admin queries: never return plaintext PII for users other than the requesting user
  • Indexing: email lookup uses a deterministic HMAC-SHA256 index (email_hmac column) — the encrypted column is never used for WHERE clauses

Login token hardening (state of the art)

  • Algorithm: ES256 (ECDSA P-256) — asymmetric; the private key signs, the public key verifies; a leaked public key cannot forge tokens
  • Access token TTL: 15 minutes maximum
  • Refresh token: 30-day httpOnly Strict cookie; rotated on every use; reuse of a rotated token revokes entire family and fires a security alert email
  • JTI claim: Every token has a unique jti; revoked JTIs stored in Redis with TTL matching the token lifetime
  • Token binding: Access token embeds a fgp (fingerprint) claim = HMAC of User-Agent + Accept-Language; backend validates on every request
  • Rotation on privilege change: Password change, TOTP enroll/revoke, and account deactivation immediately revoke all active sessions

Security gate checklist (must all pass before phase advances)

  • bandit -r backend/ — zero HIGH severity findings
  • pip audit — zero critical/high CVEs
  • npm audit --audit-level=high — zero high/critical vulnerabilities
  • All security-invariant tests pass (wrong owner, admin block, token replay, CSRF)
  • No new # noqa: S suppressions without a documented justification comment
  • Admin endpoints verified to never return password_hash, credentials_enc, or document content
  • No hardcoded secrets detected by git secrets / trufflehog

Security Requirements (Non-Negotiable)

  • Rate limiting on all auth endpoints (login, register, password reset, TOTP)
  • Constant-time comparison for all token/code verification
  • CSRF protection on all state-changing endpoints
  • Content-Security-Policy headers on all responses
  • HaveIBeenPwned API check on registration and password change
  • TOTP replay prevention (mark used codes in DB within validity window)
  • Refresh token family revocation on token reuse detection
  • Admin impersonation is an explicit architectural exclusion — no endpoint or code path may exist