Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6.9 KiB
INTEGRATIONS — document-scanner
Last updated: 2026-05-21
Summary
The backend integrates with four interchangeable AI providers for document classification: Anthropic Claude, OpenAI (and any OpenAI-compatible endpoint), Ollama, and LM Studio. There are no external databases, auth services, or cloud storage integrations — all persistence is local filesystem. The active provider is selected at runtime via settings persisted in backend/data/settings.json.
AI Providers
All providers implement the AIProvider abstract interface defined in backend/ai/base.py. The active provider is resolved at request time in backend/ai/__init__.py:get_provider().
Anthropic
- SDK:
anthropic>=0.26—backend/ai/anthropic_provider.py - Client:
anthropic.AsyncAnthropic - API: Messages API (
client.messages.create) - Default model:
claude-sonnet-4-6 - Auth:
api_keystored inbackend/data/settings.jsonunderproviders.anthropic.api_key; optionally seeded from env varANTHROPIC_API_KEY(.env.example) - Calls made:
classify(max_tokens=1024),suggest_topics(max_tokens=256),health_check(max_tokens=5) - Text limit: 8,000 characters per request (
MAX_AI_CHARS = 8_000)
OpenAI
- SDK:
openai>=1.30—backend/ai/openai_provider.py - Client:
openai.AsyncOpenAI - API: Chat Completions (
client.chat.completions.create) - Default model:
gpt-4o - Auth:
api_keystored inbackend/data/settings.jsonunderproviders.openai.api_key; optionally seeded from env varOPENAI_API_KEY(.env.example) - Custom base URL: Supported via
providers.openai.base_urlin settings (allows pointing at any OpenAI-compatible endpoint)
Ollama
- Provider file:
backend/ai/ollama_provider.py - Implementation: Subclass of
OpenAIProvider— uses the OpenAI SDK with a custombase_url - Default base URL:
http://host.docker.internal:11434/v1 - Default model:
llama3.2 - Auth: Stub key
"ollama"(no real auth required) - Network path: Reaches the host machine's Ollama daemon via Docker's
host.docker.internalDNS alias (configured indocker-compose.ymlviaextra_hosts)
LM Studio
- Provider file:
backend/ai/lmstudio_provider.py - Implementation: Subclass of
OpenAIProvider— uses the OpenAI SDK with a custombase_url - Default base URL:
http://host.docker.internal:1234/v1 - Default model:
gemma-4-e4b-it - Auth: Stub key
"lm-studio"(no real auth required) - Network path: Reaches the host machine's LM Studio server via
host.docker.internal(sameextra_hostssetting) - Default active provider — the app works out of the box with LM Studio and no API keys
Provider Selection & Settings Persistence
- Active provider and all per-provider config (model names, API keys, base URLs) are persisted in
backend/data/settings.json. - Settings are loaded fresh on each classification request in
backend/services/classifier.py:classify_document(). - API keys returned from the settings API are masked (last 4 chars shown) via
backend/services/storage.py:mask_api_key(). - The Settings UI allows switching providers without restart.
Frontend ↔ Backend Communication
- Protocol: HTTP REST over JSON (and multipart form for uploads)
- Client: Native browser
fetchAPI —frontend/src/api/client.js - Base path: All requests go to
/api/*— no hardcoded backend hostname in the frontend - Proxy (dev): Vite dev server proxies
/api→http://backend:8000—frontend/vite.config.js - Proxy (prod): Comment in
frontend/src/api/client.jsnotes nginx is expected; no nginx config is present in the repo
API Endpoints consumed by the frontend
| Method | Path | Purpose |
|---|---|---|
| POST | /api/documents/upload |
Upload file with optional auto-classify flag |
| GET | /api/documents |
List documents (paginated, optional topic filter) |
| GET | /api/documents/:id |
Get single document metadata |
| DELETE | /api/documents/:id |
Delete document |
| POST | /api/documents/:id/classify |
(Re)classify document, optional topic list |
| GET | /api/topics |
List all topics |
| POST | /api/topics |
Create topic |
| PATCH | /api/topics/:id |
Update topic |
| DELETE | /api/topics/:id |
Delete topic |
| POST | /api/topics/suggest |
AI topic suggestions for a document |
| GET | /api/settings |
Get settings (keys masked) |
| PATCH | /api/settings |
Update settings |
| POST | /api/settings/test-provider |
Health-check the active or named provider |
| GET | /api/settings/default-prompt |
Retrieve the default classification system prompt |
Docker Services
Defined in docker-compose.yml:
| Service | Image | Port | Notes |
|---|---|---|---|
backend |
Built from ./backend/Dockerfile |
8000:8000 |
Mounts ./backend/data:/app/data for persistence; ./backend:/app for hot-reload |
frontend |
Built from ./frontend/Dockerfile |
5173:5173 |
Mounts ./frontend/src and index.html for hot-reload; depends on backend |
Both services use extra_hosts: host.docker.internal:host-gateway on the backend to allow Ollama/LM Studio connections to the host machine.
Environment Variables
| Variable | Required | Where used | Notes |
|---|---|---|---|
DATA_DIR |
No | backend/config.py |
Root path for uploads/metadata/settings; defaults to /app/data |
ANTHROPIC_API_KEY |
No | .env.example |
Bootstrap only — app manages keys via settings UI |
OPENAI_API_KEY |
No | .env.example |
Bootstrap only — app manages keys via settings UI |
PYTHONDONTWRITEBYTECODE |
No | docker-compose.yml |
Set to 1 to suppress .pyc files in Docker |
Authentication & Identity
- No user authentication. The application has no login system, sessions, or identity provider.
- API keys for AI providers are stored in plain text in
backend/data/settings.json(masked only when returned via the settings API).
Monitoring & Observability
- No error tracking service (no Sentry, Datadog, etc.).
- No structured logging framework — FastAPI default stdout logging only.
- A
/healthendpoint exists atbackend/main.pyreturning{"status": "ok"}. - Provider connectivity tested on demand via
POST /api/settings/test-provider.
Webhooks & Callbacks
- None — the application makes no outbound webhook calls and exposes no webhook receiver endpoints.
Gaps / Unknowns
- No nginx or reverse-proxy config present for production deployments; the client-side comment references it but no config exists.
- No container registry or CI/CD pipeline configuration detected.
- API keys are stored in a plain JSON file on disk with no encryption at rest.
- The
ANTHROPIC_API_KEY/OPENAI_API_KEYenv vars from.env.exampleare noted as bootstrap helpers but no code in the repo reads them directly — they appear to be manual seeding hints only.