Files
curo1305 0f760c379d fix: remove obsolete /data/documents and /config dirs from Dockerfiles
doc-service and ai-service no longer use local filesystem directories —
all file and config I/O goes through storage-service. Update README and
CLAUDE.md to reflect 6-service architecture, new volumes, and add
storage-service step to the "Adding a new resource" checklist.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 13:45:12 +02:00

446 lines
18 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CLAUDE.md
This file provides permanent, authoritative guidance to Claude Code for every session. It covers project-wide concerns only. Service-specific details live in sub-files — read them only when working in that service:
- `backend/CLAUDE.md` — auth/users/admin/settings/plugins endpoints; DB models; JWT/bcrypt/sanitization security; naming conventions
- `frontend/CLAUDE.md` — routes, components, API client patterns, XSS prevention
- `features/ai-service/CLAUDE.md` — /chat, /health, /queue endpoints; queue service
- `features/doc-service/CLAUDE.md` — document/category/share endpoints; DB models; PDF limits; file watcher
- `features/storage-service/CLAUDE.md` — storage API, pluggable backend drivers (local/S3/WebDAV), migration
---
## Merge checklist
Before merging any feature branch into `main`, every test relevant to the changed area in `tests/ALL_TESTS.md` (and the relevant service-specific file) must be marked passing. The test suite covers all 20 feature areas across five service files:
- `tests/backend_tests.md` — §19, §18
- `tests/frontend_tests.md` — §19
- `tests/doc-service_tests.md` — §1016
- `tests/ai-service_tests.md` — §17
- `tests/storage-service_tests.md` — §20
Do not merge without it.
---
## CLAUDE.md self-update checkpoint
**After every change to the codebase**, before committing, check which CLAUDE.md files need updating:
- New route added → update **API Endpoints** in `backend/CLAUDE.md`, `features/doc-service/CLAUDE.md`, or `features/ai-service/CLAUDE.md`; update **Frontend Routes** in `frontend/CLAUDE.md`
- New DB model or column → update **Database Models** in `backend/CLAUDE.md` or `features/doc-service/CLAUDE.md`
- New migration → update **Migration chain** table in `backend/CLAUDE.md` or `features/doc-service/CLAUDE.md`
- New file or directory → update **File & Folder Tree** in the relevant sub-file; update the high-level tree in this root file only if a top-level directory changes
- New limit or default value changed → update **Default Values & Limits** in the relevant sub-file
- New dependency, auth mechanism, or security pattern → update **Security Standards** in the relevant sub-file
- New Docker service, volume, network, or env var → update **Docker Infrastructure** in this file
- Stack version changed → update **Stack** in this file
- New feature or endpoint added → add test rows to **both** `tests/ALL_TESTS.md` (in the relevant section) **and** the matching service-specific file (`tests/backend_tests.md`, `tests/frontend_tests.md`, `tests/doc-service_tests.md`, `tests/ai-service_tests.md`, or `tests/storage-service_tests.md`). Use the same test number and format as existing rows.
This check is mandatory — treat it the same as updating STATUS.md.
---
## Stack
| Layer | Tech |
|---|---|
| Backend | FastAPI (async), SQLAlchemy 2 (async), Alembic, PostgreSQL 16 |
| Auth | JWT RS256 via `python-jose`, bcrypt via `bcrypt` (direct, 13 rounds) |
| Frontend | React 18, TypeScript, Vite, React Router v6, TanStack Query, Axios |
| UI Library | shadcn/ui (Radix primitives + Tailwind CSS v3) |
| Styling | Tailwind CSS v3, CSS custom properties for theme tokens |
| Containerisation | Docker Compose (5 services, non-root users, named volumes) |
---
## Commands
All test, build, and package-manager commands run **inside Docker** — never on the host. See the memory note: "Testing inside Docker only".
### Full stack
```bash
# Dev stack (hot-reload, Vite on :5173)
cp .env.example backend/.env
docker compose -f docker-compose.yml -f docker-compose.dev.yml up --build
# Prod stack
docker compose up --build -d
```
For service-specific commands (migrations, lint), see `backend/CLAUDE.md` and `frontend/CLAUDE.md`.
---
## File & Folder Tree
```
/
├── CLAUDE.md ← This file — project-wide context
├── README.md ← Project overview, containers table, Current State
├── TODO.md ← Task list
├── .env.example ← Template for backend/.env
├── docker-compose.yml ← Production (5 services, named volumes)
├── docker-compose.dev.yml ← Dev overrides (hot-reload, host ports)
├── .githooks/pre-commit ← Runs scripts/security_check.py before every commit
├── scripts/security_check.py ← Static analysis: secrets, weak crypto, SQLi, JWT
├── changelog/YYYY-MM-DD_<slug>.md ← Per-date change logs
├── tests/ALL_TESTS.md ← Full test suite (all 19 areas); must pass before merging to main
├── tests/backend_tests.md ← Backend-only tests (§19, §18)
├── tests/frontend_tests.md ← Frontend-only tests (§19)
├── tests/doc-service_tests.md ← Doc-service tests (§1016)
├── tests/ai-service_tests.md ← AI-service tests (§17)
├── dev-watch/ ← Dev bind-mount for file watcher testing (.gitkeep only)
├── backend/ ← FastAPI gateway (port 8000, internal); see backend/CLAUDE.md
├── features/
│ ├── ai-service/ ← AI provider intermediary (port 8010, internal); see features/ai-service/CLAUDE.md
│ └── doc-service/ ← PDF extraction microservice (port 8001, internal); see features/doc-service/CLAUDE.md
└── frontend/ ← React SPA (port 5173 dev / 80 prod); see frontend/CLAUDE.md
```
---
## Architecture
### Request flow
```
Browser (:5173 dev / :80 prod)
└── Vite dev proxy / nginx
└── /api/* ──→ backend:8000 (FastAPI)
┌───────────────┼───────────────────┐
/auth /admin /documents/*
/users /groups /documents/categories/*
/profile /settings
/services │ │
JSON volume proxy (injects x-user-id,
(/config) x-user-groups) │
doc-service:8001
ai-service:8010
(classify, chat)
```
### Auth flow
1. `POST /api/auth/login` → RS256 JWT (8 h), stored in `localStorage`
2. Axios interceptor injects `Authorization: Bearer {token}` on every request
3. `get_current_user` dep validates token on every protected route
4. Admin routes additionally check `user.is_superuser`; return 404 (not 403) if not admin
---
## Security Standards
These standards are **non-negotiable**. Every change must comply. Implementation-specific security rules (JWT, bcrypt, input sanitization, XSS, SQLi, admin routes) are in the relevant sub-CLAUDE.md files.
### Network isolation
- `backend-net`: all containers except frontend; not reachable from host in prod.
- `frontend-net`: only frontend; single host port (80 prod / 5173 dev).
- DB, backend, doc-service, ai-service, storage-service have **no** host port bindings in prod.
### Storage rule (non-negotiable)
**No service may write to a filesystem path for persistent data.** All file/blob storage must go through the storage-service HTTP API (`PUT/GET/DELETE /objects/{bucket}/{key}`). Config JSON files must be stored in the `config` bucket. Uploaded files must be stored in the `documents` bucket. Violation is a security and architecture defect.
The only two persistent storage mechanisms in the project are:
1. **PostgreSQL** — structured/relational data
2. **storage-service** — all file/blob/config data (local filesystem by default; switchable to S3-compatible or WebDAV)
New services and features must follow this pattern. See `features/storage-service/CLAUDE.md` for the API reference.
### Pre-commit security hook
`.githooks/pre-commit` runs `scripts/security_check.py` on every staged commit. It blocks commits that contain:
1. Hardcoded credentials / private keys / AWS creds
2. `eval()`, `exec()`, `shell=True`, `pickle.loads()`, `yaml.load()` without SafeLoader
3. MD5, SHA1, DES, `random.random()` / `random.randint()` for security use
4. SQL f-strings / format strings / concatenation passed to `execute()`/`query()`
5. JWT algorithm `"none"`, `verify_exp=False`, expiry > 9999 min, hardcoded secrets
6. `debug=True`, `print()` with passwords
7. `bandit` static analysis failures
**Never** bypass with `--no-verify` unless explicitly instructed by the user.
---
## Default Values & Limits (cross-cutting)
| Parameter | Value | Location |
|-----------|-------|----------|
| Health check interval | 30 s | `service_health.py` |
| Service poll (frontend) | 30 s | `AppsPage.tsx`, `DashboardPage.tsx` |
All other per-service defaults are in the relevant sub-CLAUDE.md file.
---
## Docker Infrastructure
### Services
| Service | Image base | Internal port | User | Volumes | Network |
|---------|-----------|---------------|------|---------|---------|
| `db` | postgres:16-alpine | 5432 | 70:70 | `postgres_data` | backend-net |
| `backend` | python:3.12-slim | 8000 | 1001:1001 | — | backend-net |
| `ai-service` | python:3.12-slim | 8010 | 1001:1001 | — | backend-net |
| `doc-service` | python:3.12-slim | 8001 | 1001:1001 | `watch_data` | backend-net |
| `storage-service` | python:3.12-slim | 8020 | 1001:1001 | `storage_data` | backend-net |
| `frontend` | nginx-unprivileged:alpine | 8080 | 1001:1001 | — | backend-net, frontend-net |
### Volumes
| Volume | Mount path | Contains |
|--------|-----------|---------|
| `postgres_data` | `/var/lib/postgresql/data` | PostgreSQL data |
| `storage_data` | `/data/storage` | All file/blob storage: PDFs (`documents/`) and config JSONs (`config/`) |
| `watch_data` | `/data/watch` | Watch directory (bind-mount NAS/Nextcloud via docker-compose.override.yml) |
### Networks
| Network | Host-accessible | Members |
|---------|----------------|---------|
| `backend-net` | No (no host ports in prod) | db, backend, ai-service, doc-service, storage-service, frontend |
| `frontend-net` | Yes (port 80 → frontend:8080) | frontend |
### Environment variables (required in `backend/.env`)
```
DATABASE_URL=postgresql+asyncpg://<user>:<pass>@db:5432/destroying_sap
CORS_ORIGINS=["http://localhost:5173"]
JWT_PRIVATE_KEY=<PEM, newlines as \n>
JWT_PUBLIC_KEY=<PEM, newlines as \n>
```
Injected by docker-compose (not in `.env`):
```
DOC_SERVICE_URL=http://doc-service:8001
AI_SERVICE_URL=http://ai-service:8010
STORAGE_SERVICE_URL=http://storage-service:8020
```
---
## Workflows
### STATUS.md workflow
Every directory with runnable code has a `STATUS.md`. These are the canonical **resume point** for each session.
**At the start of every conversation:**
1. Read the `STATUS.md` for every directory you will touch.
2. If it does not exist for a directory you are working in, create it using the structure below.
This applies equally to subagents.
**After making changes**, update affected `STATUS.md` files:
- Add new endpoints / models / routes.
- Move completed items off the **Future work** checklist.
- Add new items to **Known limitations** or **Future work**.
- Keep the **What it is** summary accurate.
**Structure:**
```markdown
# <Service Name> — Status
## What it is
One paragraph: purpose, port, database/storage, how traffic arrives.
## Current functionality
Subsections per router / feature area. Tables for endpoints.
## Architecture
ASCII diagram of call graph / data flow.
## Known limitations / not implemented
Bullet list of known gaps.
## Future work
- [ ] Planned improvements
```
Maintained in: `backend/`, `features/ai-service/`, `features/doc-service/`, `frontend/`
---
### Changelog convention
Every time files are added or modified, append to `changelog/YYYY-MM-DD_<slug>.md`. If today's file exists, append; otherwise create new.
Each entry must include:
- A heading with date and short description
- `**Timestamp:**` in ISO-8601 format
- A **Summary** sentence
- A **Files Added / Modified / Deleted** list with one-line descriptions
---
### Adding a new resource (checklist)
1. Add ORM model in `backend/app/models/`, import it in `models/__init__.py`
2. Run migration: `docker compose exec backend alembic revision --autogenerate -m "add <resource>"` then `alembic upgrade head`
3. Add Pydantic schemas in `backend/app/schemas/`
4. Add router in `backend/app/routers/`, mount it in `main.py`
5. Add API function(s) to `frontend/src/api/client.ts`
6. Add page component in `frontend/src/pages/`, register route in `App.tsx`
7. If the resource involves file or blob data: store it via `PUT /objects/{bucket}/{key}` on `storage-service:8020`. Never write to the local filesystem. See `features/storage-service/CLAUDE.md` for the API.
8. Update `STATUS.md` for affected services
9. Add changelog entry
---
### Git convention
Always run `git push` immediately after every `git commit`.
---
### Feature branch & isolated test environment
Every non-trivial implementation (anything beyond a one-line fix or doc change) **must** follow this workflow:
#### 0 — Mandatory planning phase (REQUIRED before any code changes)
Before touching any code, present a written plan and **wait for explicit user approval**. Do not open files to edit, do not create branches, do not write code until the user says the plan is approved.
The plan must include:
- **What** is changing and **why**
- **Which files** will be created or modified (with paths)
- **Database / migration impact** (if any)
- **API contract changes** (new endpoints, changed schemas)
- **Frontend route / component changes**
- **Risks or non-obvious decisions**
Only proceed to step 1 after the user responds with explicit approval (e.g. "looks good", "go ahead", "approved").
#### 1 — Create a feature branch
After the planning phase is approved, branch off `main`. Name the branch after the title of the change — use lowercase words separated by hyphens, descriptive enough to understand at a glance what the branch does:
```bash
git checkout main && git pull
git checkout -b feat/<descriptive-title> # e.g. feat/user-profile-avatar-upload, feat/document-bulk-delete
```
#### 2 — Spin up an isolated Docker stack for the feature
The feature stack always uses port `5173` (same as the main dev stack). Stop the main stack before starting a feature stack, and restart it when done.
**Stop the main dev stack first:**
```bash
docker compose -f docker-compose.yml -f docker-compose.dev.yml down
```
**Create a per-feature override file** at `docker-compose.feat-<slug>.yml` (gitignored):
```yaml
# docker-compose.feat-<slug>.yml — feature test stack, never committed to main
services:
frontend:
container_name: frontend-<slug>
backend:
container_name: backend-<slug>
doc-service:
container_name: doc-service-<slug>
ai-service:
container_name: ai-service-<slug>
db:
container_name: db-<slug>
networks:
backend-net:
name: backend-net-<slug>
frontend-net:
name: frontend-net-<slug>
```
**Start the feature stack**:
```bash
docker compose -f docker-compose.yml \
-f docker-compose.dev.yml \
-f docker-compose.feat-<slug>.yml \
--project-name <slug> up --build
```
The feature frontend is now reachable at `http://localhost:5173`.
#### 3 — Develop on the feature branch
All code changes happen on `feat/<slug>`. Commit and push normally:
```bash
git add <files>
git commit -m "feat: <description>"
git push -u origin feat/<slug>
```
#### 4 — Confirm functionality
Before merging, verify all of the following on `http://localhost:5173`:
- [ ] Login and registration work end-to-end
- [ ] The specific feature works as intended
- [ ] No regressions visible in the UI
- [ ] Backend logs show no unexpected errors: `docker compose -p <slug> logs backend`
- [ ] Migrations (if any) applied cleanly: `docker compose -p <slug> exec backend alembic upgrade head`
#### 5 — Merge to main
Once all checks pass:
```bash
git checkout main
git merge --no-ff feat/<slug> -m "Merge feat/<slug>: <description>"
git push
git branch -d feat/<slug>
git push origin --delete feat/<slug>
```
#### 6 — Tear down the feature stack and restart main dev stack
```bash
docker compose -f docker-compose.yml \
-f docker-compose.dev.yml \
-f docker-compose.feat-<slug>.yml \
--project-name <slug> down --volumes --remove-orphans
rm docker-compose.feat-<slug>.yml
# Restart the main dev stack on :5173
docker compose -f docker-compose.yml -f docker-compose.dev.yml up --build -d
```
---
### Infrastructure change protocol
After **any** change to Dockerfiles, `docker-compose*.yml`, `nginx.conf`, or setup scripts:
1. **Update `README.md`** — containers table, ports, image names, Current State section.
2. **Dev stack** — verify login and registration end-to-end:
```bash
docker compose -f docker-compose.yml -f docker-compose.dev.yml up --build
```
3. **Prod stack** — run the same checks:
```bash
docker compose up --build -d
```
4. Confirm non-root users:
```bash
docker inspect <container> --format '{{.Config.User}}'
```
5. **Tear down** after testing:
```bash
docker compose down --volumes --remove-orphans
```
---
### Security hook
`.githooks/pre-commit` (registered via `git config core.hooksPath .githooks`). Runs `scripts/security_check.py` in Docker. New clones must run:
```bash
git config core.hooksPath .githooks
```
See **Security Standards → Pre-commit security hook** for the full list of checks.
**Never** bypass with `--no-verify`.