Files
kite/.planning/codebase/STACK.md
T
2026-05-22 08:53:28 +02:00

5.0 KiB

STACK — document-scanner

Last updated: 2026-05-21

Summary

Document Scanner is a full-stack application with a Python/FastAPI backend and a Vue 3 frontend, containerised with Docker Compose. The backend handles document ingestion, text extraction, and AI-powered topic classification; the frontend is a single-page app served by Vite. No external database is used — all state is persisted to the local filesystem.


Languages

Language Version Where used
Python 3.12 (pinned in backend/Dockerfile) Backend API, AI providers, services
JavaScript (ES modules) ES2022+ ("type": "module" in frontend/package.json) Frontend SPA

Runtime

Backend:

  • CPython 3.12 (Docker image: python:3.12-slim)
  • ASGI server: Uvicorn >=0.29 with standard extras (websockets, httptools)
  • Entry point: backend/main.pyuvicorn main:app

Frontend:

  • Node.js 20 (Docker image: node:20-alpine)
  • Dev server: Vite 5 on port 5173
  • Entry point: frontend/index.htmlfrontend/src/main.js

Package Manager:

  • Backend: pip — lockfile: none (ranges only in backend/requirements.txt)
  • Frontend: npm — lockfile: frontend/package-lock.json (present but not committed, generated on npm install)

Frameworks

Backend

Package Version Purpose
fastapi >=0.111 REST API framework — backend/main.py
uvicorn[standard] >=0.29 ASGI server
pydantic-settings >=2.2 Settings/config validation
python-multipart latest Multipart file upload parsing

Frontend

Package Version Purpose
vue ^3.4.0 UI framework — frontend/src/App.vue and all components
vue-router ^4.3.0 Client-side routing — frontend/src/router/index.js
pinia ^2.1.0 State management — frontend/src/stores/

Build / Dev Tooling

Tool Version Purpose
vite ^5.2.0 Frontend bundler and dev server — frontend/vite.config.js
@vitejs/plugin-vue ^5.0.0 Vue SFC support in Vite
tailwindcss ^3.4.0 Utility-first CSS — frontend/tailwind.config.js
postcss ^8.4.0 CSS processing — frontend/postcss.config.js
autoprefixer ^10.4.0 CSS vendor prefixing

Key Backend Dependencies

Package Version Purpose
anthropic >=0.26 Anthropic Claude API client — backend/ai/anthropic_provider.py
openai >=1.30 OpenAI / OpenAI-compatible API client — backend/ai/openai_provider.py, also used for Ollama and LM Studio via base_url override
PyMuPDF (fitz) >=1.24 PDF text extraction — backend/services/extractor.py
python-docx >=1.1 DOCX text extraction — backend/services/extractor.py
pytesseract >=0.3 OCR for image files — backend/services/extractor.py
Pillow >=10.3 Image handling for OCR — backend/services/extractor.py
filelock >=3.14 File-based concurrency locks — backend/services/storage.py
aiofiles >=23.2 Async file I/O support
httpx >=0.27 Async HTTP client (used internally by anthropic and openai SDKs)

Testing

Tool Version Purpose
pytest >=8.2 Test runner — backend/pytest.ini, backend/tests/
pytest-asyncio >=0.23 Async test support; asyncio_mode = auto set in backend/pytest.ini

No frontend test framework is present.


Storage

  • File system only — no database engine.
  • Upload files stored at backend/data/uploads/ (UUID-named).
  • Document metadata stored as per-document JSON files at backend/data/metadata/.
  • Topics registry: backend/data/topics.json.
  • App settings: backend/data/settings.json.
  • File-level concurrency managed via filelock (backend/services/storage.py).

System Dependencies (backend Docker image)

Installed via apt-get in backend/Dockerfile:

  • tesseract-ocr — OCR binary for pytesseract
  • libgl1, libglib2.0-0 — shared libraries required by PyMuPDF

Configuration

  • Environment variable DATA_DIR sets the root data path (default: /app/data).
  • AI provider settings (models, API keys, base URLs) are stored in backend/data/settings.json and managed through the in-app Settings UI.
  • Optional bootstrap via .env (see .env.example): only ANTHROPIC_API_KEY and OPENAI_API_KEY are referenced.
  • Default active provider is lmstudio (no API key required).

Gaps / Unknowns

  • No Python version pinning file (.python-version, pyproject.toml) outside the Dockerfile — local dev outside Docker may use a different Python version.
  • No frontend lockfile committed; exact transitive dependency versions are non-deterministic until npm install is run.
  • No linter or formatter config detected (no .eslintrc, .prettierrc, biome.json, ruff.toml, mypy.ini, etc.).
  • No production deployment config beyond Docker Compose (no nginx config, no cloud provider manifests).