docs(codebase): refresh codebase map after Phase 06.2 completion

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
curo1305
2026-06-02 15:32:06 +02:00
parent bd17b4b22f
commit 89f8d5a654
7 changed files with 1829 additions and 621 deletions
+130 -76
View File
@@ -1,129 +1,183 @@
# STACK — document-scanner
# Technology Stack
_Last updated: 2026-05-21_
## Summary
Document Scanner is a full-stack application with a Python/FastAPI backend and a Vue 3 frontend, containerised with Docker Compose. The backend handles document ingestion, text extraction, and AI-powered topic classification; the frontend is a single-page app served by Vite. No external database is used — all state is persisted to the local filesystem.
---
**Analysis Date:** 2026-06-02
## Languages
| Language | Version | Where used |
|---|---|---|
| Python | 3.12 (pinned in `backend/Dockerfile`) | Backend API, AI providers, services |
| JavaScript (ES modules) | ES2022+ (`"type": "module"` in `frontend/package.json`) | Frontend SPA |
**Primary:**
- Python 3.12 — backend API, services, Celery tasks, storage backends
- JavaScript (ES Modules, ES2022+) — Vue 3 frontend SPA
---
**Secondary:**
- SQL — PostgreSQL schema via Alembic migrations (`backend/migrations/`)
- HTML/CSS — Vue SFC templates, Tailwind utility classes
## Runtime
**Backend:**
- CPython 3.12 (Docker image: `python:3.12-slim`)
- ASGI server: Uvicorn `>=0.29` with standard extras (websockets, httptools)
- CPython 3.12 (pinned: `FROM python:3.12-slim` in `backend/Dockerfile`)
- ASGI server: Uvicorn `>=0.29` with `[standard]` extras
- Entry point: `backend/main.py``uvicorn main:app`
**Frontend:**
- Node.js 20 (Docker image: `node:20-alpine`)
- Dev server: Vite 5 on port 5173
- Node.js 20 (pinned: `FROM node:20-alpine` in `frontend/Dockerfile`)
- Dev server: Vite 5 on port 5173, proxies `/api``http://backend:8000`
- Entry point: `frontend/index.html``frontend/src/main.js`
**Package Manager:**
- Backend: `pip`lockfile: none (ranges only in `backend/requirements.txt`)
- Frontend: `npm` — lockfile: `frontend/package-lock.json` (present but not committed, generated on `npm install`)
---
- Backend: `pip``backend/requirements.txt`; no lockfile (floating `>=` ranges used throughout — see CONCERNS.md)
- Frontend: `npm` — lockfile: `frontend/package-lock.json`
## Frameworks
### Backend
### Backend Core
| Package | Version | Purpose |
|---|---|---|
| `fastapi` | `>=0.111` | REST API framework — `backend/main.py` |
| `fastapi` | `>=0.111` | Async REST API framework — `backend/main.py` |
| `uvicorn[standard]` | `>=0.29` | ASGI server |
| `pydantic-settings` | `>=2.2` | Settings/config validation |
| `python-multipart` | latest | Multipart file upload parsing |
| `pydantic` | `>=2.0` with `[email]` | Request/response validation |
| `pydantic-settings` | `>=2.2` | Environment-based config — `backend/config.py` |
| `python-multipart` | `>=0.0.27` | Multipart file upload parsing |
### ORM / Database
| Package | Version | Purpose |
|---|---|---|
| `sqlalchemy[asyncio]` | `>=2.0.49` | Async ORM — `backend/db/session.py`, `backend/db/models.py` |
| `psycopg[binary]` | `>=3.3.4` | psycopg v3 async PostgreSQL driver |
| `alembic` | `>=1.18.4` | Schema migrations — `backend/migrations/` |
| `aiosqlite` | `>=0.20.0` | SQLite async driver (test isolation only) |
### Background Tasks
| Package | Version | Purpose |
|---|---|---|
| `celery[redis]` | `>=5.5.0` | Async task queue — `backend/celery_app.py` |
| `redis` | `>=4.6.0` | Redis async client; Celery broker + result backend + JTI token store |
### Auth / Security
| Package | Version | Purpose |
|---|---|---|
| `PyJWT` | `>=2.8.0` | JWT access token creation and verification — `backend/services/auth.py` |
| `pwdlib[argon2]` | `>=0.2.1` | Argon2id password hashing |
| `pyotp` | `>=2.9.0` | TOTP provisioning and verification (2FA) |
| `cryptography` | `>=41.0.0` | HKDF per-user key derivation; Fernet encryption for cloud credentials |
| `slowapi` | `>=0.1.9` | Rate limiting middleware on auth endpoints |
| `httpx` | `>=0.27` | Async HTTP client (HIBP k-anonymity checks, OneDrive Graph API) |
### Document Processing
| Package | Version | Purpose |
|---|---|---|
| `PyMuPDF` | `>=1.26.7` | PDF text extraction — `backend/services/extractor.py` |
| `python-docx` | `>=1.1` | DOCX text extraction — `backend/services/extractor.py` |
| `pytesseract` | `>=0.3` | OCR for image files — `backend/services/extractor.py` |
| `Pillow` | `>=10.3` | Image loading for OCR pipeline |
| `aiofiles` | `>=23.2` | Async file I/O |
### AI Classification
| Package | Version | Purpose |
|---|---|---|
| `anthropic` | `>=0.26` | Anthropic Claude SDK — `backend/ai/anthropic_provider.py` |
| `openai` | `>=1.30` | OpenAI SDK; also used as shim for Ollama and LM Studio — `backend/ai/openai_provider.py` |
### Cloud Storage SDKs
| Package | Version | Purpose |
|---|---|---|
| `minio` | `>=7.2.20` | MinIO/S3 object storage SDK — `backend/storage/minio_backend.py` |
| `google-auth-oauthlib` | `>=1.3.1` | Google OAuth2 flow — `backend/storage/google_drive_backend.py` |
| `google-api-python-client` | `>=2.196.0` | Google Drive v3 API — `backend/storage/google_drive_backend.py` |
| `msal` | `>=1.36.0` | Microsoft Auth Library for OneDrive — `backend/storage/onedrive_backend.py` |
| `webdavclient3` | `>=3.14.7` | Generic WebDAV + Nextcloud — `backend/storage/webdav_backend.py` |
| `cachetools` | `>=5.3.0` | Cloud connection caching — `backend/services/cloud_cache.py` |
### Frontend
| Package | Version | Purpose |
|---|---|---|
| `vue` | `^3.4.0` | UI framework — `frontend/src/App.vue` and all components |
| `vue-router` | `^4.3.0` | Client-side routing — `frontend/src/router/index.js` |
| `pinia` | `^2.1.0` | State management — `frontend/src/stores/` |
| `vue` | `^3.4.0` | UI framework (Options API)`frontend/src/` |
| `vue-router` | `^4.3.0` | Client-side routing — `frontend/src/router/` |
| `pinia` | `^2.1.0` | State management (JWT access token stored in memory only)`frontend/src/stores/` |
| `qrcode` | `^1.5.4` | TOTP QR code generation for 2FA enrollment UI |
| `tailwindcss` | `^3.4.0` | Utility-first CSS — `frontend/tailwind.config.js` |
### Build / Dev Tooling
### Frontend Dev / Build
| Tool | Version | Purpose |
|---|---|---|
| `vite` | `^5.2.0` | Frontend bundler and dev server — `frontend/vite.config.js` |
| `@vitejs/plugin-vue` | `^5.0.0` | Vue SFC support in Vite |
| `tailwindcss` | `^3.4.0` | Utility-first CSS — `frontend/tailwind.config.js` |
| `vite` | `^5.2.0` | Dev server and bundler — `frontend/vite.config.js` |
| `@vitejs/plugin-vue` | `^5.0.0` | Vue SFC compilation |
| `postcss` | `^8.4.0` | CSS processing — `frontend/postcss.config.js` |
| `autoprefixer` | `^10.4.0` | CSS vendor prefixing |
---
## Key Backend Dependencies
| Package | Version | Purpose |
|---|---|---|
| `anthropic` | `>=0.26` | Anthropic Claude API client — `backend/ai/anthropic_provider.py` |
| `openai` | `>=1.30` | OpenAI / OpenAI-compatible API client — `backend/ai/openai_provider.py`, also used for Ollama and LM Studio via `base_url` override |
| `PyMuPDF` (`fitz`) | `>=1.24` | PDF text extraction — `backend/services/extractor.py` |
| `python-docx` | `>=1.1` | DOCX text extraction — `backend/services/extractor.py` |
| `pytesseract` | `>=0.3` | OCR for image files — `backend/services/extractor.py` |
| `Pillow` | `>=10.3` | Image handling for OCR — `backend/services/extractor.py` |
| `filelock` | `>=3.14` | File-based concurrency locks — `backend/services/storage.py` |
| `aiofiles` | `>=23.2` | Async file I/O support |
| `httpx` | `>=0.27` | Async HTTP client (used internally by `anthropic` and `openai` SDKs) |
---
## Testing
### Testing
| Tool | Version | Purpose |
|---|---|---|
| `pytest` | `>=8.2` | Test runner — `backend/pytest.ini`, `backend/tests/` |
| `pytest-asyncio` | `>=0.23` | Async test support; `asyncio_mode = auto` set in `backend/pytest.ini` |
| `pytest` | `>=8.2` | Backend test runner — `backend/pytest.ini` |
| `pytest-asyncio` | `>=1.3.0` | Async test support (`asyncio_mode = auto`) |
| `vitest` | `^4.1.7` | Frontend test runner — `frontend/vitest.config.js` |
| `@vue/test-utils` | `^2.4.10` | Vue component test utilities |
| `happy-dom` | `^20.9.0` | DOM environment for Vitest |
No frontend test framework is present.
## Infrastructure
---
### Docker Compose Services (`docker-compose.yml`)
## Storage
| Service | Image | Port(s) | Notes |
|---|---|---|---|
| `postgres` | `postgres:17-alpine` | internal | Persistent `postgres_data` volume |
| `minio` | `minio/minio:latest` | `9000`, `9001` | S3-compatible object store; persistent `minio_data` volume |
| `redis` | `redis:7-alpine` | internal | Password-protected; Celery broker + JTI revocation store |
| `backend` | Built from `./backend` | `8000` | Hot-reload via volume mount; depends on postgres, minio, redis |
| `celery-worker` | Built from `./backend` | — | Processes `documents` queue |
| `celery-beat` | Built from `./backend` | — | Periodic task scheduler |
| `frontend` | Built from `./frontend` | `5173` | Vite dev server; proxies `/api``backend:8000` |
- **File system only** — no database engine.
- Upload files stored at `backend/data/uploads/` (UUID-named).
- Document metadata stored as per-document JSON files at `backend/data/metadata/`.
- Topics registry: `backend/data/topics.json`.
- App settings: `backend/data/settings.json`.
- File-level concurrency managed via `filelock` (`backend/services/storage.py`).
### Database Role Separation
---
- `docuvault_app` — DML only (SELECT/INSERT/UPDATE/DELETE); used by FastAPI app
- `docuvault_migrate` — DDL; used by Alembic migrations only
- Init script: `docker/postgres/initdb.d/01-init-users.sql`
## System Dependencies (backend Docker image)
### System Dependencies (backend Docker image)
Installed via `apt-get` in `backend/Dockerfile`:
- `tesseract-ocr` — OCR binary for `pytesseract`
- `libgl1`, `libglib2.0-0` — shared libraries required by PyMuPDF
---
## Configuration
- Environment variable `DATA_DIR` sets the root data path (default: `/app/data`).
- AI provider settings (models, API keys, base URLs) are stored in `backend/data/settings.json` and managed through the in-app Settings UI.
- Optional bootstrap via `.env` (see `.env.example`): only `ANTHROPIC_API_KEY` and `OPENAI_API_KEY` are referenced.
- Default active provider is `lmstudio` (no API key required).
**Environment variables** are the single source of truth, read by `pydantic-settings` in `backend/config.py`.
Required for core operation:
- `DATABASE_URL` — psycopg v3 async DSN for app user
- `DATABASE_MIGRATE_URL` — psycopg v3 DSN for migrate user
- `MINIO_ENDPOINT`, `MINIO_ACCESS_KEY`, `MINIO_SECRET_KEY`, `MINIO_BUCKET`
- `REDIS_URL` — used by both FastAPI (JTI store) and Celery
- `SECRET_KEY` — JWT signing secret
- `CLOUD_CREDS_KEY` — 32-byte master key for HKDF cloud credential encryption
Optional:
- `SMTP_HOST/PORT/USER/PASSWORD/FROM` — transactional email
- `GOOGLE_CLIENT_ID/SECRET`, `ONEDRIVE_CLIENT_ID/SECRET` — OAuth cloud storage
- `ADMIN_EMAIL`, `ADMIN_PASSWORD` — bootstrap admin account
- `SYSTEM_PROMPT`, `DEFAULT_AI_PROVIDER`, `DEFAULT_AI_MODEL` — AI defaults
- `CORS_ORIGINS`, `FRONTEND_URL`, `BACKEND_URL`
## Platform Requirements
**Development:**
- Docker + Docker Compose (preferred), or
- Python 3.12, Node.js 20 plus running PostgreSQL 17, MinIO, Redis instances locally
**Production:**
- Containerised via Docker Compose; no cloud-native manifests or reverse-proxy config detected in repo
---
## Gaps / Unknowns
- No Python version pinning file (`.python-version`, `pyproject.toml`) outside the Dockerfile — local dev outside Docker may use a different Python version.
- No frontend lockfile committed; exact transitive dependency versions are non-deterministic until `npm install` is run.
- No linter or formatter config detected (no `.eslintrc`, `.prettierrc`, `biome.json`, `ruff.toml`, `mypy.ini`, etc.).
- No production deployment config beyond Docker Compose (no nginx config, no cloud provider manifests).
*Stack analysis: 2026-06-02*