7a34807fa0
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
130 lines
5.0 KiB
Markdown
130 lines
5.0 KiB
Markdown
# STACK — document-scanner
|
|
|
|
_Last updated: 2026-05-21_
|
|
|
|
## Summary
|
|
|
|
Document Scanner is a full-stack application with a Python/FastAPI backend and a Vue 3 frontend, containerised with Docker Compose. The backend handles document ingestion, text extraction, and AI-powered topic classification; the frontend is a single-page app served by Vite. No external database is used — all state is persisted to the local filesystem.
|
|
|
|
---
|
|
|
|
## Languages
|
|
|
|
| Language | Version | Where used |
|
|
|---|---|---|
|
|
| Python | 3.12 (pinned in `backend/Dockerfile`) | Backend API, AI providers, services |
|
|
| JavaScript (ES modules) | ES2022+ (`"type": "module"` in `frontend/package.json`) | Frontend SPA |
|
|
|
|
---
|
|
|
|
## Runtime
|
|
|
|
**Backend:**
|
|
- CPython 3.12 (Docker image: `python:3.12-slim`)
|
|
- ASGI server: Uvicorn `>=0.29` with standard extras (websockets, httptools)
|
|
- Entry point: `backend/main.py` — `uvicorn main:app`
|
|
|
|
**Frontend:**
|
|
- Node.js 20 (Docker image: `node:20-alpine`)
|
|
- Dev server: Vite 5 on port 5173
|
|
- Entry point: `frontend/index.html` → `frontend/src/main.js`
|
|
|
|
**Package Manager:**
|
|
- Backend: `pip` — lockfile: none (ranges only in `backend/requirements.txt`)
|
|
- Frontend: `npm` — lockfile: `frontend/package-lock.json` (present but not committed, generated on `npm install`)
|
|
|
|
---
|
|
|
|
## Frameworks
|
|
|
|
### Backend
|
|
|
|
| Package | Version | Purpose |
|
|
|---|---|---|
|
|
| `fastapi` | `>=0.111` | REST API framework — `backend/main.py` |
|
|
| `uvicorn[standard]` | `>=0.29` | ASGI server |
|
|
| `pydantic-settings` | `>=2.2` | Settings/config validation |
|
|
| `python-multipart` | latest | Multipart file upload parsing |
|
|
|
|
### Frontend
|
|
|
|
| Package | Version | Purpose |
|
|
|---|---|---|
|
|
| `vue` | `^3.4.0` | UI framework — `frontend/src/App.vue` and all components |
|
|
| `vue-router` | `^4.3.0` | Client-side routing — `frontend/src/router/index.js` |
|
|
| `pinia` | `^2.1.0` | State management — `frontend/src/stores/` |
|
|
|
|
### Build / Dev Tooling
|
|
|
|
| Tool | Version | Purpose |
|
|
|---|---|---|
|
|
| `vite` | `^5.2.0` | Frontend bundler and dev server — `frontend/vite.config.js` |
|
|
| `@vitejs/plugin-vue` | `^5.0.0` | Vue SFC support in Vite |
|
|
| `tailwindcss` | `^3.4.0` | Utility-first CSS — `frontend/tailwind.config.js` |
|
|
| `postcss` | `^8.4.0` | CSS processing — `frontend/postcss.config.js` |
|
|
| `autoprefixer` | `^10.4.0` | CSS vendor prefixing |
|
|
|
|
---
|
|
|
|
## Key Backend Dependencies
|
|
|
|
| Package | Version | Purpose |
|
|
|---|---|---|
|
|
| `anthropic` | `>=0.26` | Anthropic Claude API client — `backend/ai/anthropic_provider.py` |
|
|
| `openai` | `>=1.30` | OpenAI / OpenAI-compatible API client — `backend/ai/openai_provider.py`, also used for Ollama and LM Studio via `base_url` override |
|
|
| `PyMuPDF` (`fitz`) | `>=1.24` | PDF text extraction — `backend/services/extractor.py` |
|
|
| `python-docx` | `>=1.1` | DOCX text extraction — `backend/services/extractor.py` |
|
|
| `pytesseract` | `>=0.3` | OCR for image files — `backend/services/extractor.py` |
|
|
| `Pillow` | `>=10.3` | Image handling for OCR — `backend/services/extractor.py` |
|
|
| `filelock` | `>=3.14` | File-based concurrency locks — `backend/services/storage.py` |
|
|
| `aiofiles` | `>=23.2` | Async file I/O support |
|
|
| `httpx` | `>=0.27` | Async HTTP client (used internally by `anthropic` and `openai` SDKs) |
|
|
|
|
---
|
|
|
|
## Testing
|
|
|
|
| Tool | Version | Purpose |
|
|
|---|---|---|
|
|
| `pytest` | `>=8.2` | Test runner — `backend/pytest.ini`, `backend/tests/` |
|
|
| `pytest-asyncio` | `>=0.23` | Async test support; `asyncio_mode = auto` set in `backend/pytest.ini` |
|
|
|
|
No frontend test framework is present.
|
|
|
|
---
|
|
|
|
## Storage
|
|
|
|
- **File system only** — no database engine.
|
|
- Upload files stored at `backend/data/uploads/` (UUID-named).
|
|
- Document metadata stored as per-document JSON files at `backend/data/metadata/`.
|
|
- Topics registry: `backend/data/topics.json`.
|
|
- App settings: `backend/data/settings.json`.
|
|
- File-level concurrency managed via `filelock` (`backend/services/storage.py`).
|
|
|
|
---
|
|
|
|
## System Dependencies (backend Docker image)
|
|
|
|
Installed via `apt-get` in `backend/Dockerfile`:
|
|
- `tesseract-ocr` — OCR binary for `pytesseract`
|
|
- `libgl1`, `libglib2.0-0` — shared libraries required by PyMuPDF
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
- Environment variable `DATA_DIR` sets the root data path (default: `/app/data`).
|
|
- AI provider settings (models, API keys, base URLs) are stored in `backend/data/settings.json` and managed through the in-app Settings UI.
|
|
- Optional bootstrap via `.env` (see `.env.example`): only `ANTHROPIC_API_KEY` and `OPENAI_API_KEY` are referenced.
|
|
- Default active provider is `lmstudio` (no API key required).
|
|
|
|
---
|
|
|
|
## Gaps / Unknowns
|
|
|
|
- No Python version pinning file (`.python-version`, `pyproject.toml`) outside the Dockerfile — local dev outside Docker may use a different Python version.
|
|
- No frontend lockfile committed; exact transitive dependency versions are non-deterministic until `npm install` is run.
|
|
- No linter or formatter config detected (no `.eslintrc`, `.prettierrc`, `biome.json`, `ruff.toml`, `mypy.ini`, etc.).
|
|
- No production deployment config beyond Docker Compose (no nginx config, no cloud provider manifests).
|