7a34807fa0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
5.0 KiB
5.0 KiB
STACK — document-scanner
Last updated: 2026-05-21
Summary
Document Scanner is a full-stack application with a Python/FastAPI backend and a Vue 3 frontend, containerised with Docker Compose. The backend handles document ingestion, text extraction, and AI-powered topic classification; the frontend is a single-page app served by Vite. No external database is used — all state is persisted to the local filesystem.
Languages
| Language | Version | Where used |
|---|---|---|
| Python | 3.12 (pinned in backend/Dockerfile) |
Backend API, AI providers, services |
| JavaScript (ES modules) | ES2022+ ("type": "module" in frontend/package.json) |
Frontend SPA |
Runtime
Backend:
- CPython 3.12 (Docker image:
python:3.12-slim) - ASGI server: Uvicorn
>=0.29with standard extras (websockets, httptools) - Entry point:
backend/main.py—uvicorn main:app
Frontend:
- Node.js 20 (Docker image:
node:20-alpine) - Dev server: Vite 5 on port 5173
- Entry point:
frontend/index.html→frontend/src/main.js
Package Manager:
- Backend:
pip— lockfile: none (ranges only inbackend/requirements.txt) - Frontend:
npm— lockfile:frontend/package-lock.json(present but not committed, generated onnpm install)
Frameworks
Backend
| Package | Version | Purpose |
|---|---|---|
fastapi |
>=0.111 |
REST API framework — backend/main.py |
uvicorn[standard] |
>=0.29 |
ASGI server |
pydantic-settings |
>=2.2 |
Settings/config validation |
python-multipart |
latest | Multipart file upload parsing |
Frontend
| Package | Version | Purpose |
|---|---|---|
vue |
^3.4.0 |
UI framework — frontend/src/App.vue and all components |
vue-router |
^4.3.0 |
Client-side routing — frontend/src/router/index.js |
pinia |
^2.1.0 |
State management — frontend/src/stores/ |
Build / Dev Tooling
| Tool | Version | Purpose |
|---|---|---|
vite |
^5.2.0 |
Frontend bundler and dev server — frontend/vite.config.js |
@vitejs/plugin-vue |
^5.0.0 |
Vue SFC support in Vite |
tailwindcss |
^3.4.0 |
Utility-first CSS — frontend/tailwind.config.js |
postcss |
^8.4.0 |
CSS processing — frontend/postcss.config.js |
autoprefixer |
^10.4.0 |
CSS vendor prefixing |
Key Backend Dependencies
| Package | Version | Purpose |
|---|---|---|
anthropic |
>=0.26 |
Anthropic Claude API client — backend/ai/anthropic_provider.py |
openai |
>=1.30 |
OpenAI / OpenAI-compatible API client — backend/ai/openai_provider.py, also used for Ollama and LM Studio via base_url override |
PyMuPDF (fitz) |
>=1.24 |
PDF text extraction — backend/services/extractor.py |
python-docx |
>=1.1 |
DOCX text extraction — backend/services/extractor.py |
pytesseract |
>=0.3 |
OCR for image files — backend/services/extractor.py |
Pillow |
>=10.3 |
Image handling for OCR — backend/services/extractor.py |
filelock |
>=3.14 |
File-based concurrency locks — backend/services/storage.py |
aiofiles |
>=23.2 |
Async file I/O support |
httpx |
>=0.27 |
Async HTTP client (used internally by anthropic and openai SDKs) |
Testing
| Tool | Version | Purpose |
|---|---|---|
pytest |
>=8.2 |
Test runner — backend/pytest.ini, backend/tests/ |
pytest-asyncio |
>=0.23 |
Async test support; asyncio_mode = auto set in backend/pytest.ini |
No frontend test framework is present.
Storage
- File system only — no database engine.
- Upload files stored at
backend/data/uploads/(UUID-named). - Document metadata stored as per-document JSON files at
backend/data/metadata/. - Topics registry:
backend/data/topics.json. - App settings:
backend/data/settings.json. - File-level concurrency managed via
filelock(backend/services/storage.py).
System Dependencies (backend Docker image)
Installed via apt-get in backend/Dockerfile:
tesseract-ocr— OCR binary forpytesseractlibgl1,libglib2.0-0— shared libraries required by PyMuPDF
Configuration
- Environment variable
DATA_DIRsets the root data path (default:/app/data). - AI provider settings (models, API keys, base URLs) are stored in
backend/data/settings.jsonand managed through the in-app Settings UI. - Optional bootstrap via
.env(see.env.example): onlyANTHROPIC_API_KEYandOPENAI_API_KEYare referenced. - Default active provider is
lmstudio(no API key required).
Gaps / Unknowns
- No Python version pinning file (
.python-version,pyproject.toml) outside the Dockerfile — local dev outside Docker may use a different Python version. - No frontend lockfile committed; exact transitive dependency versions are non-deterministic until
npm installis run. - No linter or formatter config detected (no
.eslintrc,.prettierrc,biome.json,ruff.toml,mypy.ini, etc.). - No production deployment config beyond Docker Compose (no nginx config, no cloud provider manifests).