# INTEGRATIONS — document-scanner _Last updated: 2026-05-21_ ## Summary The backend integrates with four interchangeable AI providers for document classification: Anthropic Claude, OpenAI (and any OpenAI-compatible endpoint), Ollama, and LM Studio. There are no external databases, auth services, or cloud storage integrations — all persistence is local filesystem. The active provider is selected at runtime via settings persisted in `backend/data/settings.json`. --- ## AI Providers All providers implement the `AIProvider` abstract interface defined in `backend/ai/base.py`. The active provider is resolved at request time in `backend/ai/__init__.py:get_provider()`. ### Anthropic - **SDK:** `anthropic>=0.26` — `backend/ai/anthropic_provider.py` - **Client:** `anthropic.AsyncAnthropic` - **API:** Messages API (`client.messages.create`) - **Default model:** `claude-sonnet-4-6` - **Auth:** `api_key` stored in `backend/data/settings.json` under `providers.anthropic.api_key`; optionally seeded from env var `ANTHROPIC_API_KEY` (`.env.example`) - **Calls made:** `classify` (max_tokens=1024), `suggest_topics` (max_tokens=256), `health_check` (max_tokens=5) - **Text limit:** 8,000 characters per request (`MAX_AI_CHARS = 8_000`) ### OpenAI - **SDK:** `openai>=1.30` — `backend/ai/openai_provider.py` - **Client:** `openai.AsyncOpenAI` - **API:** Chat Completions (`client.chat.completions.create`) - **Default model:** `gpt-4o` - **Auth:** `api_key` stored in `backend/data/settings.json` under `providers.openai.api_key`; optionally seeded from env var `OPENAI_API_KEY` (`.env.example`) - **Custom base URL:** Supported via `providers.openai.base_url` in settings (allows pointing at any OpenAI-compatible endpoint) ### Ollama - **Provider file:** `backend/ai/ollama_provider.py` - **Implementation:** Subclass of `OpenAIProvider` — uses the OpenAI SDK with a custom `base_url` - **Default base URL:** `http://host.docker.internal:11434/v1` - **Default model:** `llama3.2` - **Auth:** Stub key `"ollama"` (no real auth required) - **Network path:** Reaches the host machine's Ollama daemon via Docker's `host.docker.internal` DNS alias (configured in `docker-compose.yml` via `extra_hosts`) ### LM Studio - **Provider file:** `backend/ai/lmstudio_provider.py` - **Implementation:** Subclass of `OpenAIProvider` — uses the OpenAI SDK with a custom `base_url` - **Default base URL:** `http://host.docker.internal:1234/v1` - **Default model:** `gemma-4-e4b-it` - **Auth:** Stub key `"lm-studio"` (no real auth required) - **Network path:** Reaches the host machine's LM Studio server via `host.docker.internal` (same `extra_hosts` setting) - **Default active provider** — the app works out of the box with LM Studio and no API keys --- ## Provider Selection & Settings Persistence - Active provider and all per-provider config (model names, API keys, base URLs) are persisted in `backend/data/settings.json`. - Settings are loaded fresh on each classification request in `backend/services/classifier.py:classify_document()`. - API keys returned from the settings API are masked (last 4 chars shown) via `backend/services/storage.py:mask_api_key()`. - The Settings UI allows switching providers without restart. --- ## Frontend ↔ Backend Communication - **Protocol:** HTTP REST over JSON (and multipart form for uploads) - **Client:** Native browser `fetch` API — `frontend/src/api/client.js` - **Base path:** All requests go to `/api/*` — no hardcoded backend hostname in the frontend - **Proxy (dev):** Vite dev server proxies `/api` → `http://backend:8000` — `frontend/vite.config.js` - **Proxy (prod):** Comment in `frontend/src/api/client.js` notes nginx is expected; no nginx config is present in the repo ### API Endpoints consumed by the frontend | Method | Path | Purpose | |---|---|---| | POST | `/api/documents/upload` | Upload file with optional auto-classify flag | | GET | `/api/documents` | List documents (paginated, optional topic filter) | | GET | `/api/documents/:id` | Get single document metadata | | DELETE | `/api/documents/:id` | Delete document | | POST | `/api/documents/:id/classify` | (Re)classify document, optional topic list | | GET | `/api/topics` | List all topics | | POST | `/api/topics` | Create topic | | PATCH | `/api/topics/:id` | Update topic | | DELETE | `/api/topics/:id` | Delete topic | | POST | `/api/topics/suggest` | AI topic suggestions for a document | | GET | `/api/settings` | Get settings (keys masked) | | PATCH | `/api/settings` | Update settings | | POST | `/api/settings/test-provider` | Health-check the active or named provider | | GET | `/api/settings/default-prompt` | Retrieve the default classification system prompt | --- ## Docker Services Defined in `docker-compose.yml`: | Service | Image | Port | Notes | |---|---|---|---| | `backend` | Built from `./backend/Dockerfile` | `8000:8000` | Mounts `./backend/data:/app/data` for persistence; `./backend:/app` for hot-reload | | `frontend` | Built from `./frontend/Dockerfile` | `5173:5173` | Mounts `./frontend/src` and `index.html` for hot-reload; depends on `backend` | Both services use `extra_hosts: host.docker.internal:host-gateway` on the backend to allow Ollama/LM Studio connections to the host machine. --- ## Environment Variables | Variable | Required | Where used | Notes | |---|---|---|---| | `DATA_DIR` | No | `backend/config.py` | Root path for uploads/metadata/settings; defaults to `/app/data` | | `ANTHROPIC_API_KEY` | No | `.env.example` | Bootstrap only — app manages keys via settings UI | | `OPENAI_API_KEY` | No | `.env.example` | Bootstrap only — app manages keys via settings UI | | `PYTHONDONTWRITEBYTECODE` | No | `docker-compose.yml` | Set to `1` to suppress `.pyc` files in Docker | --- ## Authentication & Identity - No user authentication. The application has no login system, sessions, or identity provider. - API keys for AI providers are stored in plain text in `backend/data/settings.json` (masked only when returned via the settings API). --- ## Monitoring & Observability - No error tracking service (no Sentry, Datadog, etc.). - No structured logging framework — FastAPI default stdout logging only. - A `/health` endpoint exists at `backend/main.py` returning `{"status": "ok"}`. - Provider connectivity tested on demand via `POST /api/settings/test-provider`. --- ## Webhooks & Callbacks - None — the application makes no outbound webhook calls and exposes no webhook receiver endpoints. --- ## Gaps / Unknowns - No nginx or reverse-proxy config present for production deployments; the client-side comment references it but no config exists. - No container registry or CI/CD pipeline configuration detected. - API keys are stored in a plain JSON file on disk with no encryption at rest. - The `ANTHROPIC_API_KEY` / `OPENAI_API_KEY` env vars from `.env.example` are noted as bootstrap helpers but no code in the repo reads them directly — they appear to be manual seeding hints only.