chore: initial commit — existing single-user document scanner codebase

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 08:53:28 +02:00
parent 6fed5ba531
commit 7a34807fa0
71 changed files with 16408 additions and 0 deletions
@@ -0,0 +1,6 @@
+# Copy to .env and fill in as needed.
+# Settings are primarily managed through the in-app Settings UI.
+# These are NOT required — the app defaults to LM Studio with no API keys.
+
+ANTHROPIC_API_KEY=
+OPENAI_API_KEY=
@@ -0,0 +1,114 @@
+# ARCHITECTURE — document-scanner
+
+_Last updated: 2026-05-21_
+
+## Summary
+
+Document Scanner is a two-tier web application: a Vue 3 SPA communicates with a FastAPI backend via a Vite dev-proxy (or directly in production). The backend handles document ingestion, text extraction, AI-based classification, and flat-file persistence. AI provider selection is fully runtime-configurable via a provider pattern abstraction.
+
+---
+
+## System Overview
+
+```
+Browser (Vue 3 SPA)
+      │  HTTP/JSON + multipart
+      ▼
+FastAPI  (port 8000)
+  ├── api/documents.py   – upload, list, get, delete, reclassify
+  ├── api/topics.py      – CRUD for topic list
+  ├── api/settings.py    – AI provider config + system prompt
+  │
+  ├── services/
+  │   ├── extractor.py   – text extraction dispatch
+  │   ├── classifier.py  – orchestrates AI call + topic creation
+  │   └── storage.py     – flat-file JSON + filesystem persistence
+  │
+  └── ai/                – provider abstraction layer
+      ├── base.py        – AIProvider ABC + ClassificationResult
+      ├── __init__.py    – get_provider() factory
+      ├── anthropic_provider.py
+      ├── openai_provider.py
+      ├── ollama_provider.py   (subclasses OpenAIProvider)
+      └── lmstudio_provider.py (subclasses OpenAIProvider)
+                │
+                ▼
+     External AI service (Anthropic API / OpenAI API /
+                          Ollama / LM Studio — host.docker.internal)
+```
+
+---
+
+## Request Flow — Document Upload + Classification
+
+1. Frontend POSTs `multipart/form-data` to `POST /api/documents/upload`
+2. `documents.py` saves the file to `data/uploads/`, calls `extractor.extract_text()`
+3. Extracted text (truncated to 50,000 chars) is stored in `data/metadata/<id>.json`
+4. If `auto_classify=true`, `classifier.classify_document()` is called:
+   a. Loads current settings from `data/settings.json` → calls `get_provider(settings)`
+   b. Passes document text + existing topics to `provider.classify()`
+   c. Any suggested new topics are created via `storage.add_topic()`
+   d. Document metadata is updated with assigned topics
+5. Full document metadata JSON is returned to the frontend
+
+---
+
+## AI Provider Abstraction
+
+- `AIProvider` (ABC in `ai/base.py`) defines three async methods:
+  - `classify(document_text, existing_topics, system_prompt) → ClassificationResult`
+  - `suggest_topics(document_text, system_prompt) → list[str]`
+  - `health_check() → bool`
+- `get_provider(settings: dict)` factory in `ai/__init__.py` reads `settings["active_provider"]` and instantiates the correct class
+- `OllamaProvider` and `LMStudioProvider` extend `OpenAIProvider` (both expose OpenAI-compatible endpoints)
+- Provider is re-instantiated on every request (stateless; no connection pooling)
+
+---
+
+## Data Persistence
+
+All state is stored on the local filesystem — no database:
+
+| Store | Path | Format | Access |
+|---|---|---|---|
+| Uploaded files | `data/uploads/<id>.<ext>` | Original binary | Direct filesystem |
+| Document metadata | `data/metadata/<id>.json` | JSON per document | `filelock` protected |
+| Topic list | `data/topics.json` | `{"topics": [...]}` | `filelock` protected |
+| Settings | `data/settings.json` | JSON object | `filelock` protected |
+
+`filelock` is used to prevent concurrent write corruption on JSON files.
+
+---
+
+## Frontend Architecture
+
+- Vue 3 SPA (Options API), Pinia stores, Vue Router 4
+- Three Pinia stores (`documents`, `topics`, `settings`) act as the sole data access layer — components never call the API directly
+- `src/api/client.js` is the single HTTP adapter (wraps `fetch`)
+- Vite proxies `/api/*` to `http://localhost:8000` in dev mode
+
+---
+
+## Key Patterns
+
+- **Provider Pattern** — AI backends are interchangeable at runtime via settings
+- **Service Layer** — `extractor`, `classifier`, `storage` are pure Python modules; no FastAPI coupling
+- **Pinia-as-Facade** — stores encapsulate all async API calls; views stay declarative
+
+---
+
+## Constraints & Notable Decisions
+
+- All CORS origins allowed (`allow_origins=["*"]`) — suitable for local dev, not production
+- No authentication or user model
+- Single-worker assumption for file locking (does not scale to multiple uvicorn workers)
+- AI provider re-instantiated per request (no connection reuse)
+- Data directory is volume-mounted in Docker; no backup or migration strategy
+
+---
+
+## Gaps / Unknowns
+
+- No API versioning strategy visible
+- Frontend has no error boundary or global error handling component
+- No pagination on document list endpoint (could be a scaling concern)
@@ -0,0 +1,87 @@
+# CONCERNS — document-scanner
+
+_Last updated: 2026-05-21_
+
+## Summary
+
+The codebase is a well-structured local-first prototype. The main concerns are security issues that matter if exposed beyond localhost (open CORS, no file validation, plain-text key storage), several blocking I/O calls in async handlers, and a handful of code duplication issues in the AI provider layer. Overall health is good for a local dev tool; requires hardening before any networked deployment.
+
+---
+
+## Concerns by Severity
+
+### HIGH
+
+**1. File type validation is defined but never enforced**
+`ALLOWED_MIME_TYPES` is defined in `backend/api/documents.py` but the upload handler never checks it — any file type is accepted. An attacker could upload executable files or crafted archives.
+
+**2. No file size limit on uploads**
+The entire uploaded file is read before any cap is applied. A large file could exhaust memory or disk. No `MAX_UPLOAD_SIZE` check exists at the HTTP boundary.
+
+**3. API keys stored in plain-text JSON**
+`backend/data/settings.json` stores API keys in plaintext. The volume mount in `docker-compose.yml` (`./backend/data:/app/data`) means any process with Docker access can read them. Masking only applies to API responses, not to disk.
+
+**4. CORS fully open**
+`allow_origins=["*"]` in `main.py` means any website can make cross-origin requests to the API, including with credentials if ever added.
+
+**5. Docker Compose mounts entire backend source as writable volume**
+`./backend:/app` gives the container write access to the host source tree. A path traversal or code execution bug in the app could overwrite source files.
+
+---
+
+### MEDIUM
+
+**6. Blocking I/O in async FastAPI handlers**
+`storage.py` uses synchronous file reads/writes and `filelock` blocking calls inside `async def` endpoints. This blocks the uvicorn event loop during every request. Should use `asyncio.to_thread()` or `aiofiles` (which is already in requirements but unused).
+
+**7. Topic rename does not cascade to documents**
+Deleting a topic removes it from document metadata, but renaming is not implemented — there is no rename endpoint. Users have no way to rename a topic without losing document associations.
+
+**8. `list_metadata` loads all documents before filtering**
+`storage.list_metadata()` reads all metadata JSON files on every list request. No pagination at the storage layer — O(N) disk reads per page request as the document count grows.
+
+**9. `topic_doc_counts()` scans all metadata on every topic request**
+Every `GET /api/topics` call triggers a full scan of all metadata files to count documents per topic. Not cached; will degrade linearly.
+
+**10. `MAX_AI_CHARS` duplicated across 3 files**
+The character truncation limit for AI input is duplicated as a magic constant in multiple provider files. The provider-level truncation is effectively dead code since `extractor.py` already truncates to `MAX_STORED_CHARS` (50,000).
+
+**11. `_parse_classification` / `_parse_suggestions` duplicated between providers**
+`anthropic_provider.py` and `openai_provider.py` each define their own JSON parsing helpers for AI responses. `test_classifier.py` only imports from `openai_provider`, meaning the Anthropic variants are untested.
+
+**12. `health_check()` makes real billed API calls**
+The "Test Connection" UI action calls `provider.health_check()`, which makes a real API call to Anthropic/OpenAI — incurring cost and latency every time the user tests connectivity. Should use a cheaper probe (e.g., list models endpoint or a cached status).
+
+---
+
+### LOW
+
+**13. `uvicorn --reload` hardcoded in docker-compose.yml**
+Hot-reload is hardcoded in the production compose file. There is no separate `docker-compose.prod.yml` or build-arg to disable it.
+
+**14. Unused `shutil` import in `storage.py`**
+`import shutil` appears in `storage.py` but is never used.
+
+**15. Topic IDs are 8-character UUID prefixes**
+`str(uuid.uuid4())[:8]` generates IDs with ~4 billion combinations — low collision risk for personal use but not safe at scale or for security-sensitive identifiers.
+
+**16. `classify_document` request body uses raw `dict`, not a Pydantic model**
+The reclassify endpoint accepts an unvalidated `dict` body. Invalid input causes an unformatted 500 rather than a clean 422 validation error.
+
+**17. No global frontend error handling**
+There is no Vue error boundary or global `window.onerror` / `app.config.errorHandler`. Failed API calls in stores may surface as silent failures or unhandled promise rejections.
+
+**18. No document download endpoint**
+Uploaded files are stored in `data/uploads/` but there is no `GET /api/documents/:id/file` endpoint to retrieve the original binary. Files are effectively write-only through the UI.
+
+**19. `aiofiles` in requirements but never used**
+`aiofiles>=23.2` is listed in `requirements.txt` but no code imports it. The blocking I/O concern (item 6) should use it.
+
+---
+
+## Gaps / Unknowns
+
+- Production deployment path is undefined (no nginx, no TLS, no auth)
+- OCR language support for pytesseract is not configured (defaults to English only)
+- `suggest_topics` method on all providers is untested — unclear if it is used in the current UI flow
+- No backup or recovery strategy for `data/` volume
@@ -0,0 +1,94 @@
+# CONVENTIONS — document-scanner
+
+_Last updated: 2026-05-21_
+
+## Summary
+
+The codebase follows standard Python and Vue 3 conventions without heavy tooling enforcement. Backend uses async/await throughout with type hints on public interfaces. Frontend uses Vue Options API with Pinia stores as the data layer. No linter or formatter configuration is committed.
+
+---
+
+## Python Conventions (Backend)
+
+### Naming
+- Files: `snake_case.py`
+- Classes: `PascalCase` (e.g., `AnthropicProvider`, `ClassificationResult`)
+- Functions/variables: `snake_case`
+- Constants: `UPPER_SNAKE_CASE` (e.g., `MAX_STORED_CHARS`, `DATA_DIR`)
+- Private helpers: leading underscore (e.g., `_extract_pdf`, `_parse_classification`)
+
+### Async
+- All API endpoint functions are `async def`
+- All `AIProvider` methods are `async def`
+- `pytest-asyncio` with `asyncio_mode=auto` (set in `pytest.ini`)
+
+### Type Hints
+- Used on public function signatures in `ai/` layer and `services/`
+- Dataclass used for `ClassificationResult` (`@dataclass` with `field(default_factory=...)`)
+- Not used consistently in `api/` routers (rely on FastAPI/Pydantic implicit validation)
+
+### Error Handling
+- `extractor.py` wraps all extraction in `try/except Exception` and returns error strings (never raises)
+- AI providers raise on hard failures; caller (`classifier.py`) is responsible for propagating
+- No global exception handler registered in `main.py`
+
+### Imports
+- Standard library first, then third-party, then local — not enforced by isort
+- Heavy library imports (`fitz`, `pytesseract`, `docx`) are deferred inside functions to avoid import-time cost when unused
+
+### Module Docstrings
+- Present on `extractor.py` and `test_classifier.py`; absent elsewhere
+
+---
+
+## JavaScript / Vue Conventions (Frontend)
+
+### Naming
+- Vue files: `PascalCase.vue` (e.g., `DocumentCard.vue`, `AppSidebar.vue`)
+- Pinia stores: `camelCase` filename matching store ID (e.g., `documents.js` → `useDocumentsStore`)
+- Views: `<Name>View.vue` suffix
+- Components grouped by domain in subdirectories: `documents/`, `topics/`, `upload/`, `layout/`
+
+### Vue Style
+- Options API used throughout (not Composition API)
+- Props defined with type and default; no `defineProps` (Options API syntax)
+- `v-model`, `v-for`, `v-if` used directly in templates
+
+### Pinia Pattern
+- Each store encapsulates `state`, `getters`, and `actions`
+- Actions call `src/api/client.js` — components never import `client.js` directly
+- Stores are the single source of truth; views read from store state
+
+### API Client
+- `src/api/client.js` is the sole HTTP adapter
+- All paths are prefixed `/api/` (proxied to backend in dev via Vite config)
+
+### Styling
+- Tailwind CSS utility classes used directly in templates
+- No scoped `<style>` blocks observed in component list
+- Global styles in `src/style.css`
+
+---
+
+## API Design Conventions (Backend)
+
+- All endpoints prefixed `/api/` (set per router)
+- JSON responses; multipart for file upload
+- HTTP verbs follow REST: GET list, GET by ID, POST create, PUT/PATCH update, DELETE remove
+- No versioning (`/api/v1/`) — flat namespace
+
+---
+
+## Configuration
+
+- Runtime paths controlled entirely by `DATA_DIR` env var (defaults to `/app/data`)
+- AI settings persisted in `data/settings.json` — no env var overrides at runtime for provider config (except `ANTHROPIC_API_KEY` / `OPENAI_API_KEY` noted in `.env.example`)
+- No `.env` loading in backend code — env vars passed via Docker Compose `environment:` block
+
+---
+
+## Gaps / Unknowns
+
+- No ESLint, Prettier, Black, or Ruff configuration committed
+- No pre-commit hooks
+- No consistent JSDoc or Python docstring coverage
@@ -0,0 +1,144 @@
+# INTEGRATIONS — document-scanner
+
+_Last updated: 2026-05-21_
+
+## Summary
+
+The backend integrates with four interchangeable AI providers for document classification: Anthropic Claude, OpenAI (and any OpenAI-compatible endpoint), Ollama, and LM Studio. There are no external databases, auth services, or cloud storage integrations — all persistence is local filesystem. The active provider is selected at runtime via settings persisted in `backend/data/settings.json`.
+
+---
+
+## AI Providers
+
+All providers implement the `AIProvider` abstract interface defined in `backend/ai/base.py`. The active provider is resolved at request time in `backend/ai/__init__.py:get_provider()`.
+
+### Anthropic
+
+- **SDK:** `anthropic>=0.26` — `backend/ai/anthropic_provider.py`
+- **Client:** `anthropic.AsyncAnthropic`
+- **API:** Messages API (`client.messages.create`)
+- **Default model:** `claude-sonnet-4-6`
+- **Auth:** `api_key` stored in `backend/data/settings.json` under `providers.anthropic.api_key`; optionally seeded from env var `ANTHROPIC_API_KEY` (`.env.example`)
+- **Calls made:** `classify` (max_tokens=1024), `suggest_topics` (max_tokens=256), `health_check` (max_tokens=5)
+- **Text limit:** 8,000 characters per request (`MAX_AI_CHARS = 8_000`)
+
+### OpenAI
+
+- **SDK:** `openai>=1.30` — `backend/ai/openai_provider.py`
+- **Client:** `openai.AsyncOpenAI`
+- **API:** Chat Completions (`client.chat.completions.create`)
+- **Default model:** `gpt-4o`
+- **Auth:** `api_key` stored in `backend/data/settings.json` under `providers.openai.api_key`; optionally seeded from env var `OPENAI_API_KEY` (`.env.example`)
+- **Custom base URL:** Supported via `providers.openai.base_url` in settings (allows pointing at any OpenAI-compatible endpoint)
+
+### Ollama
+
+- **Provider file:** `backend/ai/ollama_provider.py`
+- **Implementation:** Subclass of `OpenAIProvider` — uses the OpenAI SDK with a custom `base_url`
+- **Default base URL:** `http://host.docker.internal:11434/v1`
+- **Default model:** `llama3.2`
+- **Auth:** Stub key `"ollama"` (no real auth required)
+- **Network path:** Reaches the host machine's Ollama daemon via Docker's `host.docker.internal` DNS alias (configured in `docker-compose.yml` via `extra_hosts`)
+
+### LM Studio
+
+- **Provider file:** `backend/ai/lmstudio_provider.py`
+- **Implementation:** Subclass of `OpenAIProvider` — uses the OpenAI SDK with a custom `base_url`
+- **Default base URL:** `http://host.docker.internal:1234/v1`
+- **Default model:** `gemma-4-e4b-it`
+- **Auth:** Stub key `"lm-studio"` (no real auth required)
+- **Network path:** Reaches the host machine's LM Studio server via `host.docker.internal` (same `extra_hosts` setting)
+- **Default active provider** — the app works out of the box with LM Studio and no API keys
+
+---
+
+## Provider Selection & Settings Persistence
+
+- Active provider and all per-provider config (model names, API keys, base URLs) are persisted in `backend/data/settings.json`.
+- Settings are loaded fresh on each classification request in `backend/services/classifier.py:classify_document()`.
+- API keys returned from the settings API are masked (last 4 chars shown) via `backend/services/storage.py:mask_api_key()`.
+- The Settings UI allows switching providers without restart.
+
+---
+
+## Frontend ↔ Backend Communication
+
+- **Protocol:** HTTP REST over JSON (and multipart form for uploads)
+- **Client:** Native browser `fetch` API — `frontend/src/api/client.js`
+- **Base path:** All requests go to `/api/*` — no hardcoded backend hostname in the frontend
+- **Proxy (dev):** Vite dev server proxies `/api` → `http://backend:8000` — `frontend/vite.config.js`
+- **Proxy (prod):** Comment in `frontend/src/api/client.js` notes nginx is expected; no nginx config is present in the repo
+
+### API Endpoints consumed by the frontend
+
+| Method | Path | Purpose |
+|---|---|---|
+| POST | `/api/documents/upload` | Upload file with optional auto-classify flag |
+| GET | `/api/documents` | List documents (paginated, optional topic filter) |
+| GET | `/api/documents/:id` | Get single document metadata |
+| DELETE | `/api/documents/:id` | Delete document |
+| POST | `/api/documents/:id/classify` | (Re)classify document, optional topic list |
+| GET | `/api/topics` | List all topics |
+| POST | `/api/topics` | Create topic |
+| PATCH | `/api/topics/:id` | Update topic |
+| DELETE | `/api/topics/:id` | Delete topic |
+| POST | `/api/topics/suggest` | AI topic suggestions for a document |
+| GET | `/api/settings` | Get settings (keys masked) |
+| PATCH | `/api/settings` | Update settings |
+| POST | `/api/settings/test-provider` | Health-check the active or named provider |
+| GET | `/api/settings/default-prompt` | Retrieve the default classification system prompt |
+
+---
+
+## Docker Services
+
+Defined in `docker-compose.yml`:
+
+| Service | Image | Port | Notes |
+|---|---|---|---|
+| `backend` | Built from `./backend/Dockerfile` | `8000:8000` | Mounts `./backend/data:/app/data` for persistence; `./backend:/app` for hot-reload |
+| `frontend` | Built from `./frontend/Dockerfile` | `5173:5173` | Mounts `./frontend/src` and `index.html` for hot-reload; depends on `backend` |
+
+Both services use `extra_hosts: host.docker.internal:host-gateway` on the backend to allow Ollama/LM Studio connections to the host machine.
+
+---
+
+## Environment Variables
+
+| Variable | Required | Where used | Notes |
+|---|---|---|---|
+| `DATA_DIR` | No | `backend/config.py` | Root path for uploads/metadata/settings; defaults to `/app/data` |
+| `ANTHROPIC_API_KEY` | No | `.env.example` | Bootstrap only — app manages keys via settings UI |
+| `OPENAI_API_KEY` | No | `.env.example` | Bootstrap only — app manages keys via settings UI |
+| `PYTHONDONTWRITEBYTECODE` | No | `docker-compose.yml` | Set to `1` to suppress `.pyc` files in Docker |
+
+---
+
+## Authentication & Identity
+
+- No user authentication. The application has no login system, sessions, or identity provider.
+- API keys for AI providers are stored in plain text in `backend/data/settings.json` (masked only when returned via the settings API).
+
+---
+
+## Monitoring & Observability
+
+- No error tracking service (no Sentry, Datadog, etc.).
+- No structured logging framework — FastAPI default stdout logging only.
+- A `/health` endpoint exists at `backend/main.py` returning `{"status": "ok"}`.
+- Provider connectivity tested on demand via `POST /api/settings/test-provider`.
+
+---
+
+## Webhooks & Callbacks
+
+- None — the application makes no outbound webhook calls and exposes no webhook receiver endpoints.
+
+---
+
+## Gaps / Unknowns
+
+- No nginx or reverse-proxy config present for production deployments; the client-side comment references it but no config exists.
+- No container registry or CI/CD pipeline configuration detected.
+- API keys are stored in a plain JSON file on disk with no encryption at rest.
+- The `ANTHROPIC_API_KEY` / `OPENAI_API_KEY` env vars from `.env.example` are noted as bootstrap helpers but no code in the repo reads them directly — they appear to be manual seeding hints only.
@@ -0,0 +1,129 @@
+# STACK — document-scanner
+
+_Last updated: 2026-05-21_
+
+## Summary
+
+Document Scanner is a full-stack application with a Python/FastAPI backend and a Vue 3 frontend, containerised with Docker Compose. The backend handles document ingestion, text extraction, and AI-powered topic classification; the frontend is a single-page app served by Vite. No external database is used — all state is persisted to the local filesystem.
+
+---
+
+## Languages
+
+| Language | Version | Where used |
+|---|---|---|
+| Python | 3.12 (pinned in `backend/Dockerfile`) | Backend API, AI providers, services |
+| JavaScript (ES modules) | ES2022+ (`"type": "module"` in `frontend/package.json`) | Frontend SPA |
+
+---
+
+## Runtime
+
+**Backend:**
+- CPython 3.12 (Docker image: `python:3.12-slim`)
+- ASGI server: Uvicorn `>=0.29` with standard extras (websockets, httptools)
+- Entry point: `backend/main.py` — `uvicorn main:app`
+
+**Frontend:**
+- Node.js 20 (Docker image: `node:20-alpine`)
+- Dev server: Vite 5 on port 5173
+- Entry point: `frontend/index.html` → `frontend/src/main.js`
+
+**Package Manager:**
+- Backend: `pip` — lockfile: none (ranges only in `backend/requirements.txt`)
+- Frontend: `npm` — lockfile: `frontend/package-lock.json` (present but not committed, generated on `npm install`)
+
+---
+
+## Frameworks
+
+### Backend
+
+| Package | Version | Purpose |
+|---|---|---|
+| `fastapi` | `>=0.111` | REST API framework — `backend/main.py` |
+| `uvicorn[standard]` | `>=0.29` | ASGI server |
+| `pydantic-settings` | `>=2.2` | Settings/config validation |
+| `python-multipart` | latest | Multipart file upload parsing |
+
+### Frontend
+
+| Package | Version | Purpose |
+|---|---|---|
+| `vue` | `^3.4.0` | UI framework — `frontend/src/App.vue` and all components |
+| `vue-router` | `^4.3.0` | Client-side routing — `frontend/src/router/index.js` |
+| `pinia` | `^2.1.0` | State management — `frontend/src/stores/` |
+
+### Build / Dev Tooling
+
+| Tool | Version | Purpose |
+|---|---|---|
+| `vite` | `^5.2.0` | Frontend bundler and dev server — `frontend/vite.config.js` |
+| `@vitejs/plugin-vue` | `^5.0.0` | Vue SFC support in Vite |
+| `tailwindcss` | `^3.4.0` | Utility-first CSS — `frontend/tailwind.config.js` |
+| `postcss` | `^8.4.0` | CSS processing — `frontend/postcss.config.js` |
+| `autoprefixer` | `^10.4.0` | CSS vendor prefixing |
+
+---
+
+## Key Backend Dependencies
+
+| Package | Version | Purpose |
+|---|---|---|
+| `anthropic` | `>=0.26` | Anthropic Claude API client — `backend/ai/anthropic_provider.py` |
+| `openai` | `>=1.30` | OpenAI / OpenAI-compatible API client — `backend/ai/openai_provider.py`, also used for Ollama and LM Studio via `base_url` override |
+| `PyMuPDF` (`fitz`) | `>=1.24` | PDF text extraction — `backend/services/extractor.py` |
+| `python-docx` | `>=1.1` | DOCX text extraction — `backend/services/extractor.py` |
+| `pytesseract` | `>=0.3` | OCR for image files — `backend/services/extractor.py` |
+| `Pillow` | `>=10.3` | Image handling for OCR — `backend/services/extractor.py` |
+| `filelock` | `>=3.14` | File-based concurrency locks — `backend/services/storage.py` |
+| `aiofiles` | `>=23.2` | Async file I/O support |
+| `httpx` | `>=0.27` | Async HTTP client (used internally by `anthropic` and `openai` SDKs) |
+
+---
+
+## Testing
+
+| Tool | Version | Purpose |
+|---|---|---|
+| `pytest` | `>=8.2` | Test runner — `backend/pytest.ini`, `backend/tests/` |
+| `pytest-asyncio` | `>=0.23` | Async test support; `asyncio_mode = auto` set in `backend/pytest.ini` |
+
+No frontend test framework is present.
+
+---
+
+## Storage
+
+- **File system only** — no database engine.
+- Upload files stored at `backend/data/uploads/` (UUID-named).
+- Document metadata stored as per-document JSON files at `backend/data/metadata/`.
+- Topics registry: `backend/data/topics.json`.
+- App settings: `backend/data/settings.json`.
+- File-level concurrency managed via `filelock` (`backend/services/storage.py`).
+
+---
+
+## System Dependencies (backend Docker image)
+
+Installed via `apt-get` in `backend/Dockerfile`:
+- `tesseract-ocr` — OCR binary for `pytesseract`
+- `libgl1`, `libglib2.0-0` — shared libraries required by PyMuPDF
+
+---
+
+## Configuration
+
+- Environment variable `DATA_DIR` sets the root data path (default: `/app/data`).
+- AI provider settings (models, API keys, base URLs) are stored in `backend/data/settings.json` and managed through the in-app Settings UI.
+- Optional bootstrap via `.env` (see `.env.example`): only `ANTHROPIC_API_KEY` and `OPENAI_API_KEY` are referenced.
+- Default active provider is `lmstudio` (no API key required).
+
+---
+
+## Gaps / Unknowns
+
+- No Python version pinning file (`.python-version`, `pyproject.toml`) outside the Dockerfile — local dev outside Docker may use a different Python version.
+- No frontend lockfile committed; exact transitive dependency versions are non-deterministic until `npm install` is run.
+- No linter or formatter config detected (no `.eslintrc`, `.prettierrc`, `biome.json`, `ruff.toml`, `mypy.ini`, etc.).
+- No production deployment config beyond Docker Compose (no nginx config, no cloud provider manifests).
@@ -0,0 +1,144 @@
+# STRUCTURE — document-scanner
+
+_Last updated: 2026-05-21_
+
+## Summary
+
+The project is a monorepo with two top-level service directories (`backend/`, `frontend/`) and Docker Compose at the root. Backend is a Python/FastAPI app; frontend is a Vue 3 SPA built with Vite. All persistent data lives under `backend/data/`.
+
+---
+
+## Top-Level Layout
+
+```
+document_scanner/
+├── backend/              Python FastAPI service
+├── frontend/             Vue 3 SPA
+├── docker-compose.yml    Two-service compose (backend + frontend)
+├── .env.example          Optional env vars (API keys)
+└── .claude/              Claude Code settings
+```
+
+---
+
+## Backend
+
+```
+backend/
+├── main.py               FastAPI app: CORS, lifespan, router registration
+├── config.py             Path constants, DEFAULT_SETTINGS, ensure_data_dirs()
+├── requirements.txt      Python dependencies
+├── pytest.ini            pytest config (asyncio_mode=auto)
+├── Dockerfile
+│
+├── api/                  FastAPI routers (thin HTTP layer)
+│   ├── documents.py      Upload, list, get, delete, reclassify endpoints
+│   ├── topics.py         Topic CRUD endpoints
+│   └── settings.py       AI provider settings endpoints
+│
+├── ai/                   AI provider abstraction
+│   ├── base.py           AIProvider ABC + ClassificationResult dataclass
+│   ├── __init__.py       get_provider() factory
+│   ├── anthropic_provider.py
+│   ├── openai_provider.py
+│   ├── ollama_provider.py      extends OpenAIProvider
+│   └── lmstudio_provider.py    extends OpenAIProvider
+│
+├── services/             Business logic (no FastAPI dependency)
+│   ├── extractor.py      Text extraction: PDF/DOCX/image/text dispatch
+│   ├── classifier.py     Orchestrates AI call + topic auto-creation
+│   └── storage.py        Flat-file JSON CRUD + filelock
+│
+├── data/                 Runtime data (volume-mounted in Docker)
+│   ├── uploads/          Uploaded document files
+│   ├── metadata/         Per-document JSON metadata files
+│   ├── topics.json       Global topic list
+│   └── settings.json     Active AI provider + system prompt config
+│
+└── tests/
+    ├── conftest.py       Fixtures: isolated tmp data dir, TestClient, sample files
+    ├── test_health.py
+    ├── test_documents.py
+    ├── test_topics.py
+    ├── test_settings.py
+    ├── test_extractor.py
+    ├── test_classifier.py
+    └── test_lmstudio.py
+```
+
+---
+
+## Frontend
+
+```
+frontend/
+├── index.html            Vite entry HTML
+├── vite.config.js        Vite config (Vue plugin, /api proxy)
+├── tailwind.config.js
+├── postcss.config.js
+├── package.json          Vue 3, Vue Router 4, Pinia; no test framework
+├── Dockerfile
+│
+└── src/
+    ├── main.js           App bootstrap: Vue + Pinia + Router
+    ├── App.vue           Root component (sidebar layout wrapper)
+    ├── style.css         Global Tailwind imports
+    │
+    ├── api/
+    │   └── client.js     fetch wrapper; all API calls go through here
+    │
+    ├── stores/           Pinia stores (data + actions layer)
+    │   ├── documents.js  Document list, upload, classify state
+    │   ├── topics.js     Topic list CRUD state
+    │   └── settings.js   AI provider settings state
+    │
+    ├── router/
+    │   └── index.js      Routes: /, /topics, /topics/:name, /document/:id, /settings
+    │
+    ├── views/            Page-level components (one per route)
+    │   ├── HomeView.vue
+    │   ├── TopicsView.vue
+    │   ├── DocumentView.vue
+    │   └── SettingsView.vue
+    │
+    └── components/       Reusable UI components
+        ├── layout/
+        │   └── AppSidebar.vue
+        ├── documents/
+        │   └── DocumentCard.vue
+        ├── topics/
+        │   ├── TopicBadge.vue
+        │   └── TopicManager.vue
+        └── upload/
+            ├── DropZone.vue
+            └── UploadProgress.vue
+```
+
+---
+
+## Key Entry Points
+
+| File | Purpose |
+|---|---|
+| `backend/main.py` | FastAPI app instantiation, middleware, router registration |
+| `backend/config.py` | All path constants and default settings — change storage paths here |
+| `backend/ai/__init__.py` | Add a new AI provider here |
+| `frontend/src/main.js` | Vue app bootstrap |
+| `frontend/src/api/client.js` | All HTTP calls originate here |
+
+---
+
+## Where to Add New Code
+
+- **New API endpoint**: add router in `backend/api/`, register in `backend/main.py`
+- **New AI provider**: implement `AIProvider` ABC in `backend/ai/`, add case in `get_provider()`
+- **New document type**: add extraction branch in `backend/services/extractor.py`
+- **New frontend page**: add view in `src/views/`, add route in `src/router/index.js`
+- **New shared UI component**: add to relevant `src/components/<category>/` subdirectory
+
+---
+
+## Gaps / Unknowns
+
+- No `src/components/settings/` subdirectory — settings UI is entirely in `SettingsView.vue`
+- No migration or schema versioning for `topics.json` / `settings.json` flat files
@@ -0,0 +1,87 @@
+# TESTING — document-scanner
+
+_Last updated: 2026-05-21_
+
+## Summary
+
+The backend has solid integration test coverage across all API surfaces and services using pytest + FastAPI TestClient. Each test runs in a fully isolated temporary data directory, so there is no shared state between tests. The frontend has no test framework configured at all.
+
+---
+
+## Backend Testing
+
+### Framework
+- **pytest** + **pytest-asyncio** (`asyncio_mode = auto` in `pytest.ini`)
+- **FastAPI TestClient** (synchronous ASGI test client from `httpx`)
+- No mocking library — AI calls are either tested with real parsing logic or the AI layer is swapped via provider mocking
+
+### Test Isolation Strategy (conftest.py)
+- `isolated_data_dir` fixture is `autouse=True` — every test automatically gets:
+  - A fresh `tmp_path/data/` directory with `uploads/`, `metadata/`
+  - Clean `topics.json` and `settings.json` initialized from `DEFAULT_SETTINGS`
+  - Monkeypatched `DATA_DIR` env var and all module-level path constants in `config` and `services.storage`
+  - New `FileLock` instances pointing to the tmp dir
+- `client` fixture wraps FastAPI `TestClient` with the isolated data dir active
+
+### Test Files
+
+| File | What it covers |
+|---|---|
+| `test_health.py` | `GET /health` returns `{"status": "ok"}` |
+| `test_documents.py` | Upload TXT/PDF (no-classify), list, get, delete; extracts text correctly |
+| `test_topics.py` | Create, list, delete topics via API |
+| `test_settings.py` | Read default settings, update provider config |
+| `test_extractor.py` | Unit tests for `extract_text()` on TXT, PDF, DOCX, image paths |
+| `test_classifier.py` | Unit tests for JSON parsing helpers (`_parse_classification`, `_parse_suggestions`, `_strip_code_fences`) — no real AI calls |
+| `test_lmstudio.py` | LMStudio provider-specific behaviour (likely mocked or uses a local endpoint) |
+
+### Fixtures Available
+
+| Fixture | Provides |
+|---|---|
+| `isolated_data_dir` | Autouse — clean tmp data dir |
+| `client` | FastAPI TestClient with isolated data |
+| `sample_txt` | A `.txt` file with test content |
+| `sample_pdf` | A minimal valid PDF created with PyMuPDF |
+
+### What Is NOT Tested
+
+- Auto-classification flow end-to-end (requires a live AI provider)
+- Document reclassify endpoint
+- Anthropic, OpenAI, Ollama provider implementations directly
+- Any concurrent write / filelock contention scenarios
+- File size / type validation edge cases
+- Frontend — no tests exist
+
+---
+
+## Frontend Testing
+
+- **No test framework installed** — `package.json` has no `vitest`, `jest`, or `@testing-library/vue`
+- No test files found under `frontend/src/`
+- No Cypress or Playwright configuration
+
+---
+
+## Running Tests
+
+```bash
+# From backend/
+pytest
+
+# With verbose output
+pytest -v
+
+# Single file
+pytest tests/test_documents.py
+```
+
+---
+
+## Gaps / Unknowns
+
+- No test coverage measurement (no `pytest-cov` in `requirements.txt`)
+- `test_lmstudio.py` content not inspected — unclear if it hits a real local endpoint
+- No CI configuration (no GitHub Actions, no Dockerfile for test runner)
+- No snapshot or contract tests for API response shapes
+- Frontend is completely untested
@@ -0,0 +1,17 @@
+FROM python:3.12-slim
+
+WORKDIR /app
+
+# System deps for PyMuPDF + OCR
+RUN apt-get update && apt-get install -y \
+    tesseract-ocr \
+    libgl1 \
+    libglib2.0-0 \
+    && rm -rf /var/lib/apt/lists/*
+
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+
+COPY . .
+
+EXPOSE 8000
@@ -0,0 +1,36 @@
+from ai.base import AIProvider, ClassificationResult
+from ai.anthropic_provider import AnthropicProvider
+from ai.openai_provider import OpenAIProvider
+from ai.ollama_provider import OllamaProvider
+from ai.lmstudio_provider import LMStudioProvider
+
+
+def get_provider(settings: dict) -> AIProvider:
+    active = settings.get("active_provider", "lmstudio")
+    providers = settings.get("providers", {})
+    cfg = providers.get(active, {})
+
+    match active:
+        case "anthropic":
+            return AnthropicProvider(
+                api_key=cfg.get("api_key", ""),
+                model=cfg.get("model", "claude-sonnet-4-6"),
+            )
+        case "openai":
+            return OpenAIProvider(
+                api_key=cfg.get("api_key", ""),
+                model=cfg.get("model", "gpt-4o"),
+                base_url=cfg.get("base_url") or None,
+            )
+        case "ollama":
+            return OllamaProvider(
+                base_url=cfg.get("base_url", "http://host.docker.internal:11434"),
+                model=cfg.get("model", "llama3.2"),
+            )
+        case "lmstudio":
+            return LMStudioProvider(
+                base_url=cfg.get("base_url", "http://host.docker.internal:1234"),
+                model=cfg.get("model", "gemma-4-e4b-it"),
+            )
+        case _:
+            raise ValueError(f"Unknown AI provider: {active}")
@@ -0,0 +1,103 @@
+import json
+import re
+import anthropic
+from ai.base import AIProvider, ClassificationResult
+
+MAX_AI_CHARS = 8_000
+
+
+class AnthropicProvider(AIProvider):
+    def __init__(self, api_key: str, model: str = "claude-sonnet-4-6"):
+        self._api_key = api_key
+        self._model = model
+
+    def _client(self):
+        return anthropic.AsyncAnthropic(api_key=self._api_key)
+
+    async def classify(
+        self,
+        document_text: str,
+        existing_topics: list[str],
+        system_prompt: str,
+    ) -> ClassificationResult:
+        topics_str = ", ".join(existing_topics) if existing_topics else "(none yet)"
+        user_msg = (
+            f"Existing topics: [{topics_str}]\n\n"
+            f"Document text:\n{document_text[:MAX_AI_CHARS]}"
+        )
+        client = self._client()
+        response = await client.messages.create(
+            model=self._model,
+            max_tokens=1024,
+            system=system_prompt,
+            messages=[{"role": "user", "content": user_msg}],
+        )
+        raw = response.content[0].text
+        return _parse_classification(raw)
+
+    async def suggest_topics(
+        self,
+        document_text: str,
+        system_prompt: str,
+    ) -> list[str]:
+        user_msg = (
+            "Suggest 3-5 topic names for this document. "
+            "Return ONLY valid JSON: {\"suggested_topics\": [\"topic1\", \"topic2\"]}\n\n"
+            f"Document text:\n{document_text[:MAX_AI_CHARS]}"
+        )
+        client = self._client()
+        response = await client.messages.create(
+            model=self._model,
+            max_tokens=256,
+            system=system_prompt,
+            messages=[{"role": "user", "content": user_msg}],
+        )
+        raw = response.content[0].text
+        return _parse_suggestions(raw)
+
+    async def health_check(self) -> bool:
+        try:
+            client = self._client()
+            await client.messages.create(
+                model=self._model,
+                max_tokens=5,
+                messages=[{"role": "user", "content": "ping"}],
+            )
+            return True
+        except Exception:
+            return False
+
+
+def _strip_code_fences(text: str) -> str:
+    text = re.sub(r"```(?:json)?\s*", "", text)
+    text = re.sub(r"```", "", text)
+    return text.strip()
+
+
+def _parse_classification(raw: str) -> ClassificationResult:
+    raw = _strip_code_fences(raw)
+    # Try to find JSON object
+    match = re.search(r"\{.*\}", raw, re.DOTALL)
+    if match:
+        try:
+            data = json.loads(match.group())
+            return ClassificationResult(
+                topics=data.get("assigned_topics", []),
+                suggested_new_topics=data.get("new_topic_suggestions", []),
+                reasoning=data.get("reasoning", ""),
+            )
+        except json.JSONDecodeError:
+            pass
+    return ClassificationResult()
+
+
+def _parse_suggestions(raw: str) -> list[str]:
+    raw = _strip_code_fences(raw)
+    match = re.search(r"\{.*\}", raw, re.DOTALL)
+    if match:
+        try:
+            data = json.loads(match.group())
+            return data.get("suggested_topics", [])
+        except json.JSONDecodeError:
+            pass
+    return []
@@ -0,0 +1,32 @@
+from abc import ABC, abstractmethod
+from dataclasses import dataclass, field
+
+
+@dataclass
+class ClassificationResult:
+    topics: list[str] = field(default_factory=list)
+    suggested_new_topics: list[str] = field(default_factory=list)
+    reasoning: str = ""
+
+
+class AIProvider(ABC):
+    @abstractmethod
+    async def classify(
+        self,
+        document_text: str,
+        existing_topics: list[str],
+        system_prompt: str,
+    ) -> ClassificationResult:
+        ...
+
+    @abstractmethod
+    async def suggest_topics(
+        self,
+        document_text: str,
+        system_prompt: str,
+    ) -> list[str]:
+        ...
+
+    @abstractmethod
+    async def health_check(self) -> bool:
+        ...
@@ -0,0 +1,10 @@
+from ai.openai_provider import OpenAIProvider
+
+
+class LMStudioProvider(OpenAIProvider):
+    def __init__(self, base_url: str = "http://host.docker.internal:1234", model: str = "gemma-4-e4b-it"):
+        super().__init__(
+            api_key="lm-studio",
+            model=model,
+            base_url=base_url.rstrip("/") + "/v1",
+        )
@@ -0,0 +1,10 @@
+from ai.openai_provider import OpenAIProvider
+
+
+class OllamaProvider(OpenAIProvider):
+    def __init__(self, base_url: str = "http://host.docker.internal:11434", model: str = "llama3.2"):
+        super().__init__(
+            api_key="ollama",
+            model=model,
+            base_url=base_url.rstrip("/") + "/v1",
+        )
@@ -0,0 +1,104 @@
+import json
+import re
+from openai import AsyncOpenAI
+from ai.base import AIProvider, ClassificationResult
+
+MAX_AI_CHARS = 8_000
+
+
+class OpenAIProvider(AIProvider):
+    def __init__(self, api_key: str, model: str = "gpt-4o", base_url: str | None = None):
+        self._api_key = api_key
+        self._model = model
+        self._base_url = base_url
+
+    def _client(self) -> AsyncOpenAI:
+        return AsyncOpenAI(api_key=self._api_key or "placeholder", base_url=self._base_url)
+
+    async def classify(
+        self,
+        document_text: str,
+        existing_topics: list[str],
+        system_prompt: str,
+    ) -> ClassificationResult:
+        topics_str = ", ".join(existing_topics) if existing_topics else "(none yet)"
+        user_msg = (
+            f"Existing topics: [{topics_str}]\n\n"
+            f"Document text:\n{document_text[:MAX_AI_CHARS]}"
+        )
+        response = await self._client().chat.completions.create(
+            model=self._model,
+            max_tokens=1024,
+            messages=[
+                {"role": "system", "content": system_prompt},
+                {"role": "user", "content": user_msg},
+            ],
+        )
+        raw = response.choices[0].message.content or ""
+        return _parse_classification(raw)
+
+    async def suggest_topics(
+        self,
+        document_text: str,
+        system_prompt: str,
+    ) -> list[str]:
+        user_msg = (
+            "Suggest 3-5 topic names for this document. "
+            "Return ONLY valid JSON: {\"suggested_topics\": [\"topic1\", \"topic2\"]}\n\n"
+            f"Document text:\n{document_text[:MAX_AI_CHARS]}"
+        )
+        response = await self._client().chat.completions.create(
+            model=self._model,
+            max_tokens=256,
+            messages=[
+                {"role": "system", "content": system_prompt},
+                {"role": "user", "content": user_msg},
+            ],
+        )
+        raw = response.choices[0].message.content or ""
+        return _parse_suggestions(raw)
+
+    async def health_check(self) -> bool:
+        try:
+            await self._client().chat.completions.create(
+                model=self._model,
+                max_tokens=5,
+                messages=[{"role": "user", "content": "ping"}],
+            )
+            return True
+        except Exception:
+            return False
+
+
+def _strip_code_fences(text: str) -> str:
+    text = re.sub(r"```(?:json)?\s*", "", text)
+    text = re.sub(r"```", "", text)
+    return text.strip()
+
+
+def _parse_classification(raw: str) -> ClassificationResult:
+    raw = _strip_code_fences(raw)
+    match = re.search(r"\{.*\}", raw, re.DOTALL)
+    if match:
+        try:
+            data = json.loads(match.group())
+            return ClassificationResult(
+                topics=data.get("assigned_topics", []),
+                suggested_new_topics=data.get("new_topic_suggestions", []),
+                reasoning=data.get("reasoning", ""),
+            )
+        except json.JSONDecodeError:
+            pass
+    return ClassificationResult()
+
+
+def _parse_suggestions(raw: str) -> list[str]:
+    raw = _strip_code_fences(raw)
+    match = re.search(r"\{.*\}", raw, re.DOTALL)
+    if match:
+        try:
+            data = json.loads(match.group())
+            return data.get("suggested_topics", [])
+        except json.JSONDecodeError:
+            pass
+    return []
@@ -0,0 +1,101 @@
+from datetime import datetime, timezone
+from fastapi import APIRouter, UploadFile, File, Form, HTTPException, Query
+from services import storage, extractor, classifier
+
+router = APIRouter(prefix="/api/documents", tags=["documents"])
+
+ALLOWED_MIME_TYPES = {
+    "application/pdf",
+    "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
+    "application/msword",
+    "text/plain",
+    "text/markdown",
+    "image/png",
+    "image/jpeg",
+    "image/jpg",
+    "image/tiff",
+    "image/webp",
+}
+
+
+@router.post("/upload")
+async def upload_document(
+    file: UploadFile = File(...),
+    auto_classify: bool = Form(True),
+):
+    content = await file.read()
+    if len(content) == 0:
+        raise HTTPException(400, "Empty file")
+
+    mime = file.content_type or "application/octet-stream"
+
+    saved = storage.save_upload(content, file.filename or "upload", mime)
+    text = extractor.extract_text(saved["path"], mime)
+
+    now = datetime.now(timezone.utc).isoformat()
+    meta = {
+        "id": saved["id"],
+        "original_name": file.filename or "upload",
+        "filename": saved["filename"],
+        "mime_type": mime,
+        "size_bytes": len(content),
+        "extracted_text": text,
+        "topics": [],
+        "created_at": now,
+        "classified_at": None,
+    }
+    storage.save_metadata(meta)
+
+    if auto_classify:
+        try:
+            topics = await classifier.classify_document(saved["id"])
+            meta["topics"] = topics
+            meta["classified_at"] = datetime.now(timezone.utc).isoformat()
+        except Exception as e:
+            # Classification failure is non-fatal; document is still saved
+            meta["classification_error"] = str(e)
+
+    return meta
+
+
+@router.get("")
+async def list_documents(
+    topic: str | None = Query(None),
+    page: int = Query(1, ge=1),
+    per_page: int = Query(20, ge=1, le=100),
+):
+    docs = storage.list_metadata(topic=topic)
+    total = len(docs)
+    start = (page - 1) * per_page
+    return {"items": docs[start : start + per_page], "total": total, "page": page, "per_page": per_page}
+
+
+@router.get("/{doc_id}")
+async def get_document(doc_id: str):
+    meta = storage.get_metadata(doc_id)
+    if meta is None:
+        raise HTTPException(404, "Document not found")
+    return meta
+
+
+@router.delete("/{doc_id}")
+async def delete_document(doc_id: str):
+    ok = storage.delete_document(doc_id)
+    if not ok:
+        raise HTTPException(404, "Document not found")
+    return {"success": True}
+
+
+@router.post("/{doc_id}/classify")
+async def classify_document(doc_id: str, body: dict = {}):
+    meta = storage.get_metadata(doc_id)
+    if meta is None:
+        raise HTTPException(404, "Document not found")
+
+    topic_names = body.get("topics") if body else None
+    try:
+        topics = await classifier.classify_document(doc_id, topic_names)
+    except Exception as e:
+        raise HTTPException(500, f"Classification failed: {e}")
+
+    return {"topics": topics}
@@ -0,0 +1,84 @@
+import time
+from fastapi import APIRouter, HTTPException
+from pydantic import BaseModel
+from services import storage
+from config import DEFAULT_SYSTEM_PROMPT
+from ai import get_provider
+
+router = APIRouter(prefix="/api/settings", tags=["settings"])
+
+
+class SettingsPatch(BaseModel):
+    system_prompt: str | None = None
+    active_provider: str | None = None
+    providers: dict | None = None
+
+
+class TestProviderRequest(BaseModel):
+    provider: str
+
+
+@router.get("")
+async def get_settings():
+    settings = storage.load_settings()
+    return storage.settings_masked(settings)
+
+
+@router.patch("")
+async def patch_settings(body: SettingsPatch):
+    settings = storage.load_settings()
+
+    if body.system_prompt is not None:
+        settings["system_prompt"] = body.system_prompt
+
+    if body.active_provider is not None:
+        valid = {"anthropic", "openai", "ollama", "lmstudio"}
+        if body.active_provider not in valid:
+            raise HTTPException(400, f"Invalid provider. Must be one of: {valid}")
+        settings["active_provider"] = body.active_provider
+
+    if body.providers is not None:
+        # Deep merge per-provider config
+        for prov_name, prov_cfg in body.providers.items():
+            if prov_name not in settings.get("providers", {}):
+                settings.setdefault("providers", {})[prov_name] = {}
+            existing = settings["providers"][prov_name]
+            for key, val in prov_cfg.items():
+                # Don't overwrite api_key if it comes in masked (contains ****)
+                if key == "api_key" and val and "****" in str(val):
+                    continue
+                existing[key] = val
+
+    storage.save_settings(settings)
+    return storage.settings_masked(settings)
+
+
+@router.post("/test-provider")
+async def test_provider(body: TestProviderRequest):
+    settings = storage.load_settings()
+    # Temporarily switch active provider for the test
+    test_settings = dict(settings)
+    test_settings["active_provider"] = body.provider
+
+    try:
+        provider = get_provider(test_settings)
+    except ValueError as e:
+        raise HTTPException(400, str(e))
+
+    start = time.monotonic()
+    try:
+        ok = await provider.health_check()
+    except Exception as e:
+        return {"ok": False, "message": str(e), "latency_ms": 0}
+
+    latency_ms = int((time.monotonic() - start) * 1000)
+    return {
+        "ok": ok,
+        "message": "Connection successful" if ok else "Health check failed",
+        "latency_ms": latency_ms,
+    }
+
+
+@router.get("/default-prompt")
+async def get_default_prompt():
+    return {"system_prompt": DEFAULT_SYSTEM_PROMPT}
@@ -0,0 +1,72 @@
+from fastapi import APIRouter, HTTPException
+from pydantic import BaseModel
+from services import storage, classifier
+
+router = APIRouter(prefix="/api/topics", tags=["topics"])
+
+
+class TopicCreate(BaseModel):
+    name: str
+    description: str = ""
+    color: str = "#6366f1"
+
+
+class TopicUpdate(BaseModel):
+    name: str | None = None
+    description: str | None = None
+    color: str | None = None
+
+
+class SuggestRequest(BaseModel):
+    document_id: str
+
+
+@router.get("")
+async def list_topics():
+    topics = storage.load_topics()
+    counts = storage.topic_doc_counts()
+    for t in topics:
+        t["doc_count"] = counts.get(t["name"], 0)
+    return {"topics": topics}
+
+
+@router.post("")
+async def create_topic(body: TopicCreate):
+    topic = storage.create_topic(body.name, body.description, body.color)
+    topic["doc_count"] = 0
+    return topic
+
+
+@router.patch("/{topic_id}")
+async def update_topic(topic_id: str, body: TopicUpdate):
+    topic = storage.update_topic(
+        topic_id,
+        name=body.name,
+        description=body.description,
+        color=body.color,
+    )
+    if topic is None:
+        raise HTTPException(404, "Topic not found")
+    counts = storage.topic_doc_counts()
+    topic["doc_count"] = counts.get(topic["name"], 0)
+    return topic
+
+
+@router.delete("/{topic_id}")
+async def delete_topic(topic_id: str):
+    name = storage.delete_topic(topic_id)
+    if name is None:
+        raise HTTPException(404, "Topic not found")
+    return {"success": True, "removed_from_documents": True}
+
+
+@router.post("/suggest")
+async def suggest_topics(body: SuggestRequest):
+    meta = storage.get_metadata(body.document_id)
+    if meta is None:
+        raise HTTPException(404, "Document not found")
+    try:
+        suggestions = await classifier.suggest_topics_for_document(body.document_id)
+    except Exception as e:
+        raise HTTPException(500, f"Suggestion failed: {e}")
+    return {"suggested": suggestions}
@@ -0,0 +1,51 @@
+import json
+import os
+from pathlib import Path
+
+DATA_DIR = Path(os.environ.get("DATA_DIR", "/app/data"))
+UPLOADS_DIR = DATA_DIR / "uploads"
+METADATA_DIR = DATA_DIR / "metadata"
+TOPICS_FILE = DATA_DIR / "topics.json"
+SETTINGS_FILE = DATA_DIR / "settings.json"
+
+DEFAULT_SYSTEM_PROMPT = """You are a document classification assistant. When given a document's text content and a list of existing topics, you must:
+1. Assign the document to one or more relevant topics from the list.
+2. If no existing topics fit well, suggest new topic names.
+Return ONLY valid JSON in this exact format, with no additional text or explanation:
+{"assigned_topics": ["topic1"], "new_topic_suggestions": ["new topic name"]}
+If the document fits no topics and you have no suggestions, return: {"assigned_topics": [], "new_topic_suggestions": []}"""
+
+DEFAULT_SETTINGS = {
+    "system_prompt": DEFAULT_SYSTEM_PROMPT,
+    "active_provider": "lmstudio",
+    "providers": {
+        "anthropic": {
+            "api_key": "",
+            "model": "claude-sonnet-4-6"
+        },
+        "openai": {
+            "api_key": "",
+            "model": "gpt-4o",
+            "base_url": None
+        },
+        "ollama": {
+            "base_url": "http://host.docker.internal:11434",
+            "model": "llama3.2"
+        },
+        "lmstudio": {
+            "base_url": "http://host.docker.internal:1234",
+            "model": "gemma-4-e4b-it"
+        }
+    }
+}
+
+
+def ensure_data_dirs():
+    UPLOADS_DIR.mkdir(parents=True, exist_ok=True)
+    METADATA_DIR.mkdir(parents=True, exist_ok=True)
+
+    if not TOPICS_FILE.exists():
+        TOPICS_FILE.write_text(json.dumps({"topics": []}, indent=2))
+
+    if not SETTINGS_FILE.exists():
+        SETTINGS_FILE.write_text(json.dumps(DEFAULT_SETTINGS, indent=2))
@@ -0,0 +1,14 @@
+{
+  "id": "69eb8545-2e19-4651-903e-6489dbd9f687",
+  "original_name": "1907-Rechnung.pdf",
+  "filename": "69eb8545-2e19-4651-903e-6489dbd9f687.pdf",
+  "mime_type": "application/pdf",
+  "size_bytes": 38090,
+  "extracted_text": "mobilcom-debitel GmbH · Geschäftsführung: Ingo Arnold, Antonius Fromme, Rickmann von Platen \nHRB 14826 KI, Amtsgericht Kiel · Vorsitzender des Aufsichtsrats: Stephan Esch · Sitz der Gesellschaft: Büdelsdorf\nBankverbindung: Commerzbank Rendsburg · IBAN DE08214400450844443200 · BIC COBADEFFXXX\nUSt-ID: DE 194 910 634 · Gläubiger-ID: DE43ZZZ00000074855\nHaben Sie Fragen zur Rechnung?\nwww.md.de/faq\nmobilcom-debitel Kundenservice\nHandykurzwahl: 22240\nDer Anruf erfolgt zu einer ortsgebundenen Rufnummer\nTelefon: 040/55 55 41 00 0\nmobilcom-debitel Kundenservice Technik\nTelefon: 0900/10 22 24 0\n€ 2,49/Anruf, nur aus dem dt. Festnetz erreichbar\nwww.md.de\nHerrn\nDominik Ritter\nLeibnizstr. 41\n10629 Berlin\nRechnungsdatum:\nRechnungsnr.:\nKundennummer:\n31.07.2019\nM19046649250\n33040574\nPost: mobilcom-debitel GmbH · 99076 Erfurt\nIhre mobilcom-debitel Rechnung\nRechnungsbetrag netto\n55,4645 €\nUSt.-Betrag (19%)\n10,54 €\nRechnungsbetrag gesamt\n66,00 €\nDie Begleichung der Rechnung erfolgt am 07.08.2019 im Lastschriftverfahren mit der Mandatsreferenz-Nummer\nMC-33040574-00000001 von dem Konto: IBAN DE38100208900615356026.\nKennen Sie schon waipu.tv? Das ist Fernsehen wie noch nie: auf Smartphone, Tablet oder Ihrem TV.\nJetzt kostenlos testen: md.de/tv/waipu-tv.\nMobilfunk-Vertragsabrechnungen\nMobilfunk-Rufnummer: 0170 / 4322717\nVertragsnummer:\n217582256\nTeilnehmer: Dominik Ritter\nTarif:\nreal Allnet mit Smartphone 10\nMobilfunknetz: Telekom Mobilfunk\nDie Leistungen im Überblick\nMenge Details\nZeitraum/Datum\nSumme\nBasisleistungen\n1 Grundgebühr\n01.08.2019 - 31.08.2019\n31,0840 €\n1 freenet Hotspot Flat (DLS24M0TB0G0000):\nUnbegrenztes Datenvolumen im größten WLAN-Netzwerk\n01.08.2019 - 31.08.2019\n0,0000 €\n1 T@ke-away Flat Upgrade (+2 GB) - 6M (anteilig)\n03.07.2019 - 31.07.2019\n11,7839 €\n1 T@ke-away Flat Upgrade (+2 GB) - 6M\n01.08.2019 - 31.08.2019\n12,5966 €\n1 Kaspersky Passwort Manager 1 Monat (DLS1M1TB1G0299)\n(anteilig):\nEin Passwort für mehrere Konten!\n03.07.2019 - 31.07.2019\n2,3505 €\n1 Kaspersky Passwort Manager 1 Monat (DLS1M1TB1G0299)\n(anteilig)\n01.08.2019 - 02.08.2019\n0,1621 €\n1 Gutschrift Kaspersky Passwort Manager\n(DLS1M1TB1G0299) (anteilig)\n03.07.2019 - 31.07.2019\n-2,3505 €\n1 Gutschrift Kaspersky Passwort Manager\n(DLS1M1TB1G0299) (anteilig)\n01.08.2019 - 02.08.2019\n-0,1621 €\n1 Smartphone-Option\n01.08.2019 - 31.08.2019\n8,4034 €\nVerbindungen\n3 Verbindungen ins dt. Festnetz (FN)\n01.07.2019 - 03.07.2019\n0,0000 €\n39 Netzexterne Verbindungen (NX)\n28.06.2019 - 30.07.2019\n0,0000 €\n1 Abgehende Roaming Verbindungen (RA)\n17.07.2019 - 17.07.2019\n0,0000 €\n202 Datenverbindungen (DATA)\n27.06.2019 - 30.07.2019\n0,0000 €\n120 Roaming Datenverbindungen (RD)\n14.07.2019 - 20.07.2019\n0,0000 €\nZwischensumme netto\n63,8679 €\nIhre mobilcom-debitel Vorteile\n1 24 x 10 Euro Grundgebührrabatt\n01.08.2019 - 31.08.2019\n-8,4034 €\nNettobetrag für Rufnummer 0170 / 4322717\n55,4645 €\nSofern Sie die Löschung Ihrer Verbindungsdaten sofort, 90 oder 180 Tage  nach Rechnungsstellung gewünscht haben, entfällt\nmit der Löschung unsere Nachweispflicht für diese Daten.  Erfolgt innerhalb von 8 Wochen nach Erhalt der Rechnung kein\nschriftlicher Widerspruch, gilt die Rechnung  als genehmigt. Begründete Einwendungen können auch gegen einzelne in der\nRechnung dargestellte Forderungen erhoben werden. Verzug tritt spätestens 30 Tage nach Zugang der Rechnung ein. Dies\nschließt einen frühzeitigeren Verzug nicht aus. Hinweise zum Ablauf eines Anbieterwechsels finden Sie auf der Internetseite\nder Bundesnetzagentur.\nRechnungserklärung\nSeite 1 von 2\n\nmobilcom-debitel GmbH · Geschäftsführung: Ingo Arnold, Antonius Fromme, Rickmann von Platen \nHRB 14826 KI, Amtsgericht Kiel · Vorsitzender des Aufsichtsrats: Stephan Esch · Sitz der Gesellschaft: Büdelsdorf\nBankverbindung: Commerzbank Rendsburg · IBAN DE08214400450844443200 · BIC COBADEFFXXX\nUSt-ID: DE 194 910 634 · Gläubiger-ID: DE43ZZZ00000074855\nRechnungsdatum:\nRechnungsnr.:\nKundennummer:\n31.07.2019\nM19046649250\n33040574\nIhre mobilcom-debitel Rechnung\nInformationen gemäß Telekommunikations-Transparenzverordnung\nMobilfunk-Rufnummer: 0170 / 4322717\nZeitraum Datenverbrauch:\n01.06.2019 - 30.06.2019\nVertragsbeginn:\n20.12.2016 Kündigungsfrist:\n3 Monat(e) Summe vereinbartes Datenvolumen:\n8000 MB\nMindestlaufzeit bis:\n19.12.2020 Kündigungseingang bis:\n19.09.2020 Verbrauchtes Datenvolumen:\n8080 MB\nSeite 2 von 2",
+  "topics": [
+    "Telecommunications",
+    "Billing and Invoicing"
+  ],
+  "created_at": "2026-04-16T11:08:33.558670+00:00",
+  "classified_at": "2026-04-16T11:08:40.831347+00:00"
+}
@@ -0,0 +1,13 @@
+{
+  "id": "cf4dd4cf-dcfb-42f1-957d-bcdba640163b",
+  "original_name": "invoice.txt",
+  "filename": "cf4dd4cf-dcfb-42f1-957d-bcdba640163b.txt",
+  "mime_type": "text/plain",
+  "size_bytes": 108,
+  "extracted_text": "This is an invoice for professional consulting services rendered in April 2026. Total amount due: 5000 EUR.",
+  "topics": [
+    "Invoice"
+  ],
+  "created_at": "2026-04-16T11:06:08.026326+00:00",
+  "classified_at": "2026-04-16T11:06:09.636422+00:00"
+}
@@ -0,0 +1,11 @@
+{
+  "id": "e71d8a85-09a1-4cd8-b602-65aa9216a724",
+  "original_name": "test_doc.txt",
+  "filename": "e71d8a85-09a1-4cd8-b602-65aa9216a724.txt",
+  "mime_type": "text/plain",
+  "size_bytes": 57,
+  "extracted_text": "This document is about accounting and financial reports.",
+  "topics": [],
+  "created_at": "2026-04-16T11:05:24.317425+00:00",
+  "classified_at": null
+}
@@ -0,0 +1,23 @@
+{
+  "system_prompt": "You are a document classification assistant. When given a document's text content and a list of existing topics, you must:\n1. Assign the document to one or more relevant topics from the list.\n2. If no existing topics fit well, suggest new topic names.\nReturn ONLY valid JSON in this exact format, with no additional text or explanation:\n{\"assigned_topics\": [\"topic1\"], \"new_topic_suggestions\": [\"new topic name\"]}\nIf the document fits no topics and you have no suggestions, return: {\"assigned_topics\": [], \"new_topic_suggestions\": []}",
+  "active_provider": "lmstudio",
+  "providers": {
+    "anthropic": {
+      "api_key": "",
+      "model": "claude-sonnet-4-6"
+    },
+    "openai": {
+      "api_key": "",
+      "model": "gpt-4o",
+      "base_url": null
+    },
+    "ollama": {
+      "base_url": "http://host.docker.internal:11434",
+      "model": "llama3.2"
+    },
+    "lmstudio": {
+      "base_url": "http://host.docker.internal:1234",
+      "model": "gemma-4-e4b-it"
+    }
+  }
+}
@@ -0,0 +1,22 @@
+{
+  "topics": [
+    {
+      "id": "39ffdadb",
+      "name": "Test Topic",
+      "description": "",
+      "color": "#6366f1"
+    },
+    {
+      "id": "d2e0fbd8",
+      "name": "Telecommunications",
+      "description": "",
+      "color": "#6366f1"
+    },
+    {
+      "id": "d3823fd0",
+      "name": "Billing and Invoicing",
+      "description": "",
+      "color": "#6366f1"
+    }
+  ]
+}
@@ -0,0 +1 @@
+This is an invoice for professional consulting services rendered in April 2026. Total amount due: 5000 EUR.
@@ -0,0 +1 @@
+This document is about accounting and financial reports.
@@ -0,0 +1,33 @@
+from contextlib import asynccontextmanager
+from fastapi import FastAPI
+from fastapi.middleware.cors import CORSMiddleware
+from config import ensure_data_dirs
+from api.documents import router as documents_router
+from api.topics import router as topics_router
+from api.settings import router as settings_router
+
+
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    ensure_data_dirs()
+    yield
+
+
+app = FastAPI(title="Document Scanner API", version="1.0.0", lifespan=lifespan)
+
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+
+@app.get("/health")
+async def health():
+    return {"status": "ok"}
+
+
+app.include_router(documents_router)
+app.include_router(topics_router)
+app.include_router(settings_router)
@@ -0,0 +1,3 @@
+[pytest]
+asyncio_mode = auto
+testpaths = tests
@@ -0,0 +1,15 @@
+fastapi>=0.111
+uvicorn[standard]>=0.29
+python-multipart
+pydantic-settings>=2.2
+anthropic>=0.26
+openai>=1.30
+PyMuPDF>=1.24
+python-docx>=1.1
+pytesseract>=0.3
+Pillow>=10.3
+filelock>=3.14
+aiofiles>=23.2
+httpx>=0.27
+pytest>=8.2
+pytest-asyncio>=0.23
@@ -0,0 +1,59 @@
+"""
+Classification orchestrator.
+Loads settings, selects AI provider, classifies document, auto-creates suggested topics.
+"""
+from services import storage
+from ai import get_provider
+
+MAX_AI_CHARS = 8_000
+
+
+async def classify_document(doc_id: str, topic_names: list[str] | None = None) -> list[str]:
+    """
+    Classify a document by its ID. Returns the list of assigned topic names.
+    If topic_names is provided, restrict classification to those topics.
+    Auto-creates any newly suggested topics.
+    """
+    meta = storage.get_metadata(doc_id)
+    if meta is None:
+        raise ValueError(f"Document {doc_id} not found")
+
+    settings = storage.load_settings()
+    system_prompt = settings.get("system_prompt", "")
+    provider = get_provider(settings)
+
+    # Use all known topics if not specified
+    if topic_names is None:
+        all_topics = storage.load_topics()
+        topic_names = [t["name"] for t in all_topics]
+
+    text = meta.get("extracted_text", "")
+    result = await provider.classify(text[:MAX_AI_CHARS], topic_names, system_prompt)
+
+    # Collect all topic names to persist (assigned + suggested)
+    all_new_names = set(result.suggested_new_topics) | set(result.topics)
+
+    # Auto-create any topic not already in the registry
+    existing_names = {t.lower() for t in topic_names}
+    for name in all_new_names:
+        if name.strip() and name.lower() not in existing_names:
+            storage.create_topic(name.strip())
+
+    # Final list: everything the AI assigned or suggested
+    final_topics = [t for t in list(set(result.topics + result.suggested_new_topics)) if t.strip()]
+
+    storage.update_document_topics(doc_id, final_topics)
+    return final_topics
+
+
+async def suggest_topics_for_document(doc_id: str) -> list[str]:
+    """Return AI-suggested topic names without modifying the document."""
+    meta = storage.get_metadata(doc_id)
+    if meta is None:
+        raise ValueError(f"Document {doc_id} not found")
+
+    settings = storage.load_settings()
+    system_prompt = settings.get("system_prompt", "")
+    provider = get_provider(settings)
+    text = meta.get("extracted_text", "")
+    return await provider.suggest_topics(text[:MAX_AI_CHARS], system_prompt)
@@ -0,0 +1,71 @@
+"""
+Text extraction dispatcher.
+Supports: PDF (PyMuPDF), DOCX (python-docx), plain text, images (pytesseract).
+"""
+from pathlib import Path
+
+MAX_STORED_CHARS = 50_000
+
+
+def extract_text(file_path: str, mime_type: str) -> str:
+    path = Path(file_path)
+    try:
+        if mime_type == "application/pdf" or path.suffix.lower() == ".pdf":
+            return _extract_pdf(path)
+        elif mime_type in (
+            "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
+            "application/msword",
+        ) or path.suffix.lower() in (".docx", ".doc"):
+            return _extract_docx(path)
+        elif mime_type and mime_type.startswith("image/"):
+            return _extract_image(path)
+        else:
+            return _extract_text_file(path)
+    except Exception as e:
+        return f"[Extraction error: {e}]"
+
+
+def _extract_pdf(path: Path) -> str:
+    import fitz  # PyMuPDF
+    doc = fitz.open(str(path))
+    pages = []
+    for page in doc:
+        pages.append(page.get_text())
+    doc.close()
+    return _truncate("\n".join(pages))
+
+
+def _extract_docx(path: Path) -> str:
+    from docx import Document
+    doc = Document(str(path))
+    paragraphs = [p.text for p in doc.paragraphs if p.text.strip()]
+    return _truncate("\n".join(paragraphs))
+
+
+def _extract_image(path: Path) -> str:
+    try:
+        from PIL import Image
+        import pytesseract
+        img = Image.open(str(path))
+        text = pytesseract.image_to_string(img)
+        return _truncate(text)
+    except ImportError:
+        return "[OCR unavailable: pytesseract or Pillow not installed]"
+    except Exception as e:
+        return f"[OCR error: {e}]"
+
+
+def _extract_text_file(path: Path) -> str:
+    for enc in ("utf-8", "latin-1", "cp1252"):
+        try:
+            return _truncate(path.read_text(encoding=enc))
+        except UnicodeDecodeError:
+            continue
+    return "[Could not decode text file]"
+
+
+def _truncate(text: str) -> str:
+    text = text.strip()
+    if len(text) > MAX_STORED_CHARS:
+        text = text[:MAX_STORED_CHARS] + "\n[... truncated ...]"
+    return text
@@ -0,0 +1,187 @@
+import json
+import uuid
+import shutil
+from datetime import datetime, timezone
+from pathlib import Path
+from filelock import FileLock
+from config import UPLOADS_DIR, METADATA_DIR, TOPICS_FILE, SETTINGS_FILE, DEFAULT_SETTINGS
+
+
+# ── File locks ────────────────────────────────────────────────────────────────
+
+_topics_lock = FileLock(str(TOPICS_FILE) + ".lock")
+_settings_lock = FileLock(str(SETTINGS_FILE) + ".lock")
+
+
+# ── Documents ─────────────────────────────────────────────────────────────────
+
+def save_upload(file_bytes: bytes, original_name: str, mime_type: str) -> dict:
+    doc_id = str(uuid.uuid4())
+    suffix = Path(original_name).suffix.lower()
+    filename = f"{doc_id}{suffix}"
+    dest = UPLOADS_DIR / filename
+    dest.write_bytes(file_bytes)
+    return {"id": doc_id, "filename": filename, "path": str(dest)}
+
+
+def save_metadata(meta: dict) -> None:
+    path = METADATA_DIR / f"{meta['id']}.json"
+    lock = FileLock(str(path) + ".lock")
+    with lock:
+        path.write_text(json.dumps(meta, indent=2, ensure_ascii=False))
+
+
+def get_metadata(doc_id: str) -> dict | None:
+    path = METADATA_DIR / f"{doc_id}.json"
+    if not path.exists():
+        return None
+    return json.loads(path.read_text())
+
+
+def list_metadata(topic: str | None = None) -> list[dict]:
+    docs = []
+    for p in sorted(METADATA_DIR.glob("*.json"), key=lambda x: x.stat().st_mtime, reverse=True):
+        try:
+            meta = json.loads(p.read_text())
+        except Exception:
+            continue
+        if topic and topic not in meta.get("topics", []):
+            continue
+        docs.append(meta)
+    return docs
+
+
+def delete_document(doc_id: str) -> bool:
+    meta_path = METADATA_DIR / f"{doc_id}.json"
+    if not meta_path.exists():
+        return False
+    meta = json.loads(meta_path.read_text())
+    upload_path = UPLOADS_DIR / meta.get("filename", "")
+    if upload_path.exists():
+        upload_path.unlink()
+    meta_path.unlink()
+    lock_path = Path(str(meta_path) + ".lock")
+    if lock_path.exists():
+        lock_path.unlink()
+    return True
+
+
+def update_document_topics(doc_id: str, topics: list[str]) -> dict | None:
+    meta = get_metadata(doc_id)
+    if meta is None:
+        return None
+    meta["topics"] = topics
+    meta["classified_at"] = datetime.now(timezone.utc).isoformat()
+    save_metadata(meta)
+    return meta
+
+
+def remove_topic_from_all_documents(topic_name: str) -> int:
+    """Remove a topic name from all documents. Returns number of docs updated."""
+    count = 0
+    for p in METADATA_DIR.glob("*.json"):
+        try:
+            meta = json.loads(p.read_text())
+        except Exception:
+            continue
+        if topic_name in meta.get("topics", []):
+            meta["topics"] = [t for t in meta["topics"] if t != topic_name]
+            lock = FileLock(str(p) + ".lock")
+            with lock:
+                p.write_text(json.dumps(meta, indent=2, ensure_ascii=False))
+            count += 1
+    return count
+
+
+# ── Topics ────────────────────────────────────────────────────────────────────
+
+def load_topics() -> list[dict]:
+    with _topics_lock:
+        data = json.loads(TOPICS_FILE.read_text())
+    return data.get("topics", [])
+
+
+def save_topics(topics: list[dict]) -> None:
+    with _topics_lock:
+        TOPICS_FILE.write_text(json.dumps({"topics": topics}, indent=2))
+
+
+def get_topic(topic_id: str) -> dict | None:
+    return next((t for t in load_topics() if t["id"] == topic_id), None)
+
+
+def create_topic(name: str, description: str = "", color: str = "#6366f1") -> dict:
+    topics = load_topics()
+    # Deduplicate by name (case-insensitive)
+    if any(t["name"].lower() == name.lower() for t in topics):
+        return next(t for t in topics if t["name"].lower() == name.lower())
+    topic = {
+        "id": str(uuid.uuid4())[:8],
+        "name": name,
+        "description": description,
+        "color": color,
+    }
+    topics.append(topic)
+    save_topics(topics)
+    return topic
+
+
+def update_topic(topic_id: str, **kwargs) -> dict | None:
+    topics = load_topics()
+    for t in topics:
+        if t["id"] == topic_id:
+            t.update({k: v for k, v in kwargs.items() if v is not None})
+            save_topics(topics)
+            return t
+    return None
+
+
+def delete_topic(topic_id: str) -> str | None:
+    topics = load_topics()
+    topic = next((t for t in topics if t["id"] == topic_id), None)
+    if not topic:
+        return None
+    name = topic["name"]
+    save_topics([t for t in topics if t["id"] != topic_id])
+    remove_topic_from_all_documents(name)
+    return name
+
+
+def topic_doc_counts() -> dict[str, int]:
+    counts: dict[str, int] = {}
+    for p in METADATA_DIR.glob("*.json"):
+        try:
+            meta = json.loads(p.read_text())
+        except Exception:
+            continue
+        for t in meta.get("topics", []):
+            counts[t] = counts.get(t, 0) + 1
+    return counts
+
+
+# ── Settings ──────────────────────────────────────────────────────────────────
+
+def load_settings() -> dict:
+    with _settings_lock:
+        return json.loads(SETTINGS_FILE.read_text())
+
+
+def save_settings(settings: dict) -> None:
+    with _settings_lock:
+        SETTINGS_FILE.write_text(json.dumps(settings, indent=2))
+
+
+def mask_api_key(key: str) -> str:
+    if not key or len(key) <= 4:
+        return "****"
+    return "****" + key[-4:]
+
+
+def settings_masked(settings: dict) -> dict:
+    import copy
+    s = copy.deepcopy(settings)
+    for prov in ("anthropic", "openai"):
+        key = s.get("providers", {}).get(prov, {}).get("api_key", "")
+        if key:
+            s["providers"][prov]["api_key"] = mask_api_key(key)
+    return s
@@ -0,0 +1,70 @@
+"""
+pytest configuration: isolate each test with a temporary data directory.
+"""
+import os
+import json
+import pytest
+import tempfile
+import shutil
+from pathlib import Path
+from fastapi.testclient import TestClient
+
+
+@pytest.fixture(autouse=True)
+def isolated_data_dir(monkeypatch, tmp_path):
+    """Each test gets its own clean data directory."""
+    data_dir = tmp_path / "data"
+    (data_dir / "uploads").mkdir(parents=True)
+    (data_dir / "metadata").mkdir(parents=True)
+    (data_dir / "topics.json").write_text(json.dumps({"topics": []}))
+
+    from config import DEFAULT_SETTINGS
+    (data_dir / "settings.json").write_text(json.dumps(DEFAULT_SETTINGS))
+
+    monkeypatch.setenv("DATA_DIR", str(data_dir))
+
+    # Patch the module-level path constants so the running app sees the temp dir
+    import config
+    monkeypatch.setattr(config, "DATA_DIR", data_dir)
+    monkeypatch.setattr(config, "UPLOADS_DIR", data_dir / "uploads")
+    monkeypatch.setattr(config, "METADATA_DIR", data_dir / "metadata")
+    monkeypatch.setattr(config, "TOPICS_FILE", data_dir / "topics.json")
+    monkeypatch.setattr(config, "SETTINGS_FILE", data_dir / "settings.json")
+
+    import services.storage as st
+    from filelock import FileLock
+    monkeypatch.setattr(st, "UPLOADS_DIR", data_dir / "uploads")
+    monkeypatch.setattr(st, "METADATA_DIR", data_dir / "metadata")
+    monkeypatch.setattr(st, "TOPICS_FILE", data_dir / "topics.json")
+    monkeypatch.setattr(st, "SETTINGS_FILE", data_dir / "settings.json")
+    monkeypatch.setattr(st, "_topics_lock", FileLock(str(data_dir / "topics.json") + ".lock"))
+    monkeypatch.setattr(st, "_settings_lock", FileLock(str(data_dir / "settings.json") + ".lock"))
+
+    yield data_dir
+
+
+@pytest.fixture
+def client(isolated_data_dir):
+    from main import app
+    with TestClient(app) as c:
+        yield c
+
+
+@pytest.fixture
+def sample_txt(tmp_path):
+    p = tmp_path / "sample.txt"
+    p.write_text("This is a test document about invoices and finance.")
+    return p
+
+
+@pytest.fixture
+def sample_pdf(tmp_path):
+    """Create a minimal valid PDF for testing."""
+    import fitz
+    doc = fitz.open()
+    page = doc.new_page()
+    page.insert_text((50, 50), "Test PDF document about contracts and legal matters.")
+    pdf_path = tmp_path / "sample.pdf"
+    doc.save(str(pdf_path))
+    doc.close()
+    return pdf_path
@@ -0,0 +1,110 @@
+"""
+Unit tests for AI provider JSON parsing robustness and classifier orchestration.
+Uses a mock provider — no real AI calls made.
+"""
+import json
+import pytest
+from ai.openai_provider import _parse_classification, _parse_suggestions, _strip_code_fences
+from ai.base import ClassificationResult
+
+
+def test_parse_clean_json():
+    raw = '{"assigned_topics": ["finance", "invoices"], "new_topic_suggestions": []}'
+    result = _parse_classification(raw)
+    assert result.topics == ["finance", "invoices"]
+    assert result.suggested_new_topics == []
+
+
+def test_parse_with_code_fence():
+    raw = '```json\n{"assigned_topics": ["legal"], "new_topic_suggestions": ["contracts"]}\n```'
+    result = _parse_classification(raw)
+    assert result.topics == ["legal"]
+    assert result.suggested_new_topics == ["contracts"]
+
+
+def test_parse_with_preamble():
+    raw = 'Here is the classification:\n{"assigned_topics": ["hr"], "new_topic_suggestions": []}\nDone.'
+    result = _parse_classification(raw)
+    assert result.topics == ["hr"]
+
+
+def test_parse_malformed_returns_empty():
+    raw = "I cannot classify this document."
+    result = _parse_classification(raw)
+    assert result.topics == []
+    assert result.suggested_new_topics == []
+
+
+def test_strip_code_fences():
+    raw = "```json\n{}\n```"
+    assert _strip_code_fences(raw) == "{}"
+
+
+def test_parse_suggestions_clean():
+    raw = '{"suggested_topics": ["Human Resources", "Onboarding"]}'
+    result = _parse_suggestions(raw)
+    assert "Human Resources" in result
+    assert "Onboarding" in result
+
+
+def test_parse_suggestions_with_fence():
+    raw = "```\n{\"suggested_topics\": [\"Finance\"]}\n```"
+    result = _parse_suggestions(raw)
+    assert result == ["Finance"]
+
+
+def test_parse_suggestions_malformed():
+    raw = "No suggestions available."
+    result = _parse_suggestions(raw)
+    assert result == []
+
+
+@pytest.mark.asyncio
+async def test_classifier_with_mock_provider(isolated_data_dir):
+    """Test classifier orchestration with a mock provider."""
+    from unittest.mock import AsyncMock, patch
+    from ai.base import ClassificationResult
+    import services.storage as st
+
+    # Create a document
+    doc_id = "test-doc-1"
+    st.save_metadata({
+        "id": doc_id,
+        "original_name": "test.txt",
+        "filename": "test-doc-1.txt",
+        "mime_type": "text/plain",
+        "size_bytes": 50,
+        "extracted_text": "Invoice for services rendered in March 2026.",
+        "topics": [],
+        "created_at": "2026-01-01T00:00:00Z",
+        "classified_at": None,
+    })
+
+    # Create some topics
+    st.create_topic("Finance")
+    st.create_topic("Legal")
+
+    mock_result = ClassificationResult(
+        topics=["Finance"],
+        suggested_new_topics=["Invoices"],
+        reasoning="Document is about financial invoicing.",
+    )
+
+    with patch("services.classifier.get_provider") as mock_get_provider:
+        mock_provider = AsyncMock()
+        mock_provider.classify = AsyncMock(return_value=mock_result)
+        mock_get_provider.return_value = mock_provider
+
+        from services.classifier import classify_document
+        topics = await classify_document(doc_id)
+
+    assert "Finance" in topics
+    assert "Invoices" in topics
+
+    # Verify new topic was auto-created
+    all_topics = st.load_topics()
+    assert any(t["name"] == "Invoices" for t in all_topics)
+
+    # Verify document was updated
+    meta = st.get_metadata(doc_id)
+    assert "Finance" in meta["topics"]
@@ -0,0 +1,107 @@
+def test_upload_txt_no_classify(client, sample_txt):
+    with open(sample_txt, "rb") as f:
+        resp = client.post(
+            "/api/documents/upload",
+            files={"file": ("sample.txt", f, "text/plain")},
+            data={"auto_classify": "false"},
+        )
+    assert resp.status_code == 200
+    data = resp.json()
+    assert data["original_name"] == "sample.txt"
+    assert "extracted_text" in data
+    assert "invoices" in data["extracted_text"].lower() or len(data["extracted_text"]) > 0
+    assert data["topics"] == []
+    assert "id" in data
+
+
+def test_upload_pdf_no_classify(client, sample_pdf):
+    with open(sample_pdf, "rb") as f:
+        resp = client.post(
+            "/api/documents/upload",
+            files={"file": ("sample.pdf", f, "application/pdf")},
+            data={"auto_classify": "false"},
+        )
+    assert resp.status_code == 200
+    data = resp.json()
+    assert data["mime_type"] == "application/pdf"
+    assert len(data["extracted_text"]) > 0
+
+
+def test_list_documents(client, sample_txt):
+    with open(sample_txt, "rb") as f:
+        client.post(
+            "/api/documents/upload",
+            files={"file": ("a.txt", f, "text/plain")},
+            data={"auto_classify": "false"},
+        )
+    resp = client.get("/api/documents")
+    assert resp.status_code == 200
+    data = resp.json()
+    assert data["total"] == 1
+    assert len(data["items"]) == 1
+
+
+def test_list_documents_filter_by_topic(client, sample_txt):
+    with open(sample_txt, "rb") as f:
+        upload = client.post(
+            "/api/documents/upload",
+            files={"file": ("a.txt", f, "text/plain")},
+            data={"auto_classify": "false"},
+        ).json()
+
+    import services.storage as st
+    st.update_document_topics(upload["id"], ["finance"])
+
+    resp = client.get("/api/documents?topic=finance")
+    assert resp.json()["total"] == 1
+
+    resp2 = client.get("/api/documents?topic=legal")
+    assert resp2.json()["total"] == 0
+
+
+def test_get_document(client, sample_txt):
+    with open(sample_txt, "rb") as f:
+        upload = client.post(
+            "/api/documents/upload",
+            files={"file": ("a.txt", f, "text/plain")},
+            data={"auto_classify": "false"},
+        ).json()
+
+    resp = client.get(f"/api/documents/{upload['id']}")
+    assert resp.status_code == 200
+    assert resp.json()["id"] == upload["id"]
+
+
+def test_get_document_not_found(client):
+    resp = client.get("/api/documents/nonexistent")
+    assert resp.status_code == 404
+
+
+def test_delete_document(client, sample_txt):
+    with open(sample_txt, "rb") as f:
+        upload = client.post(
+            "/api/documents/upload",
+            files={"file": ("a.txt", f, "text/plain")},
+            data={"auto_classify": "false"},
+        ).json()
+
+    resp = client.delete(f"/api/documents/{upload['id']}")
+    assert resp.status_code == 200
+    assert resp.json()["success"] is True
+
+    resp2 = client.get(f"/api/documents/{upload['id']}")
+    assert resp2.status_code == 404
+
+
+def test_delete_document_not_found(client):
+    resp = client.delete("/api/documents/nonexistent")
+    assert resp.status_code == 404
+
+
+def test_upload_empty_file(client):
+    resp = client.post(
+        "/api/documents/upload",
+        files={"file": ("empty.txt", b"", "text/plain")},
+        data={"auto_classify": "false"},
+    )
+    assert resp.status_code == 400
@@ -0,0 +1,52 @@
+import pytest
+from pathlib import Path
+from services.extractor import extract_text
+
+
+def test_extract_txt(tmp_path):
+    p = tmp_path / "test.txt"
+    p.write_text("Hello world this is a test document.", encoding="utf-8")
+    text = extract_text(str(p), "text/plain")
+    assert "Hello world" in text
+
+
+def test_extract_pdf(tmp_path):
+    import fitz
+    doc = fitz.open()
+    page = doc.new_page()
+    page.insert_text((50, 50), "PDF content about legal contracts.")
+    pdf_path = tmp_path / "test.pdf"
+    doc.save(str(pdf_path))
+    doc.close()
+
+    text = extract_text(str(pdf_path), "application/pdf")
+    assert "PDF content" in text
+
+
+def test_extract_docx(tmp_path):
+    from docx import Document
+    doc = Document()
+    doc.add_paragraph("DOCX paragraph about financial reports.")
+    docx_path = tmp_path / "test.docx"
+    doc.save(str(docx_path))
+
+    text = extract_text(
+        str(docx_path),
+        "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
+    )
+    assert "DOCX paragraph" in text
+
+
+def test_extract_unknown_falls_back_to_text(tmp_path):
+    p = tmp_path / "test.csv"
+    p.write_text("col1,col2\nval1,val2", encoding="utf-8")
+    text = extract_text(str(p), "text/csv")
+    assert "col1" in text
+
+
+def test_extract_truncation(tmp_path):
+    p = tmp_path / "big.txt"
+    p.write_text("A" * 60_000, encoding="utf-8")
+    text = extract_text(str(p), "text/plain")
+    assert len(text) <= 50_100  # 50k + truncation marker
+    assert "truncated" in text
@@ -0,0 +1,4 @@
+def test_health(client):
+    resp = client.get("/health")
+    assert resp.status_code == 200
+    assert resp.json() == {"status": "ok"}
@@ -0,0 +1,46 @@
+"""
+Integration test against a live LM Studio instance.
+Skipped automatically if LM Studio is not reachable.
+"""
+import pytest
+import httpx
+
+
+def lmstudio_available() -> bool:
+    try:
+        r = httpx.get("http://host.docker.internal:1234/v1/models", timeout=3)
+        return r.status_code == 200
+    except Exception:
+        return False
+
+
+@pytest.mark.skipif(not lmstudio_available(), reason="LM Studio not reachable at host.docker.internal:1234")
+@pytest.mark.asyncio
+async def test_lmstudio_health_check():
+    from ai.lmstudio_provider import LMStudioProvider
+    provider = LMStudioProvider(
+        base_url="http://host.docker.internal:1234",
+        model="gemma-4-e4b-it",
+    )
+    ok = await provider.health_check()
+    assert ok, "LM Studio health check failed"
+
+
+@pytest.mark.skipif(not lmstudio_available(), reason="LM Studio not reachable at host.docker.internal:1234")
+@pytest.mark.asyncio
+async def test_lmstudio_classify():
+    from ai.lmstudio_provider import LMStudioProvider
+    from config import DEFAULT_SYSTEM_PROMPT
+
+    provider = LMStudioProvider(
+        base_url="http://host.docker.internal:1234",
+        model="gemma-4-e4b-it",
+    )
+    result = await provider.classify(
+        document_text="This document is an invoice for software development services.",
+        existing_topics=["Finance", "Legal", "HR"],
+        system_prompt=DEFAULT_SYSTEM_PROMPT,
+    )
+    # Result should have some topics assigned or suggested
+    assert isinstance(result.topics, list)
+    assert isinstance(result.suggested_new_topics, list)
@@ -0,0 +1,60 @@
+def test_get_settings_defaults(client):
+    resp = client.get("/api/settings")
+    assert resp.status_code == 200
+    data = resp.json()
+    assert data["active_provider"] == "lmstudio"
+    assert "system_prompt" in data
+    assert "providers" in data
+    # API keys should be masked or empty
+    for prov in ("anthropic", "openai"):
+        key = data["providers"][prov].get("api_key", "")
+        assert "****" not in key or len(key) <= 8  # masked or empty
+
+
+def test_patch_system_prompt(client):
+    new_prompt = "Custom system prompt for testing."
+    resp = client.patch("/api/settings", json={"system_prompt": new_prompt})
+    assert resp.status_code == 200
+
+    resp2 = client.get("/api/settings")
+    assert resp2.json()["system_prompt"] == new_prompt
+
+
+def test_patch_active_provider(client):
+    resp = client.patch("/api/settings", json={"active_provider": "ollama"})
+    assert resp.status_code == 200
+    assert resp.json()["active_provider"] == "ollama"
+
+
+def test_patch_invalid_provider(client):
+    resp = client.patch("/api/settings", json={"active_provider": "unknown"})
+    assert resp.status_code == 400
+
+
+def test_patch_provider_config(client):
+    resp = client.patch("/api/settings", json={
+        "providers": {
+            "ollama": {"model": "mistral", "base_url": "http://host.docker.internal:11434"}
+        }
+    })
+    assert resp.status_code == 200
+    assert resp.json()["providers"]["ollama"]["model"] == "mistral"
+
+
+def test_masked_api_key_not_overwritten(client):
+    """Patching with a masked key should not overwrite the real stored key."""
+    # First set a real key
+    client.patch("/api/settings", json={"providers": {"anthropic": {"api_key": "sk-ant-realkey"}}})
+    # Then patch with masked key (simulating frontend re-submitting)
+    client.patch("/api/settings", json={"providers": {"anthropic": {"api_key": "****key"}}})
+    # The stored key should still be the real one
+    import services.storage as st
+    settings = st.load_settings()
+    assert settings["providers"]["anthropic"]["api_key"] == "sk-ant-realkey"
+
+
+def test_get_default_prompt(client):
+    resp = client.get("/api/settings/default-prompt")
+    assert resp.status_code == 200
+    assert "system_prompt" in resp.json()
+    assert len(resp.json()["system_prompt"]) > 0
@@ -0,0 +1,72 @@
+def test_list_topics_empty(client):
+    resp = client.get("/api/topics")
+    assert resp.status_code == 200
+    assert resp.json()["topics"] == []
+
+
+def test_create_topic(client):
+    resp = client.post("/api/topics", json={"name": "Finance", "description": "Financial docs", "color": "#ff0000"})
+    assert resp.status_code == 200
+    data = resp.json()
+    assert data["name"] == "Finance"
+    assert data["color"] == "#ff0000"
+    assert "id" in data
+
+
+def test_create_topic_deduplication(client):
+    client.post("/api/topics", json={"name": "Finance"})
+    resp = client.post("/api/topics", json={"name": "finance"})  # case-insensitive
+    assert resp.status_code == 200
+    topics = client.get("/api/topics").json()["topics"]
+    assert len(topics) == 1
+
+
+def test_update_topic(client):
+    create = client.post("/api/topics", json={"name": "Old Name"}).json()
+    resp = client.patch(f"/api/topics/{create['id']}", json={"name": "New Name"})
+    assert resp.status_code == 200
+    assert resp.json()["name"] == "New Name"
+
+
+def test_update_topic_not_found(client):
+    resp = client.patch("/api/topics/nonexistent", json={"name": "X"})
+    assert resp.status_code == 404
+
+
+def test_delete_topic(client):
+    create = client.post("/api/topics", json={"name": "ToDelete"}).json()
+    resp = client.delete(f"/api/topics/{create['id']}")
+    assert resp.status_code == 200
+    assert resp.json()["success"] is True
+
+    topics = client.get("/api/topics").json()["topics"]
+    assert not any(t["name"] == "ToDelete" for t in topics)
+
+
+def test_delete_topic_cascades_to_documents(client, sample_txt):
+    # Create a topic
+    topic = client.post("/api/topics", json={"name": "Legal"}).json()
+
+    # Upload doc (no auto classify to control topics manually)
+    with open(sample_txt, "rb") as f:
+        upload = client.post(
+            "/api/documents/upload",
+            files={"file": ("sample.txt", f, "text/plain")},
+            data={"auto_classify": "false"},
+        ).json()
+
+    # Manually set topic on the document via classify endpoint
+    import services.storage as st
+    st.update_document_topics(upload["id"], ["Legal"])
+
+    # Delete topic
+    client.delete(f"/api/topics/{topic['id']}")
+
+    # Verify document no longer has the topic
+    doc = client.get(f"/api/documents/{upload['id']}").json()
+    assert "Legal" not in doc["topics"]
+
+
+def test_delete_topic_not_found(client):
+    resp = client.delete("/api/topics/nonexistent")
+    assert resp.status_code == 404
@@ -0,0 +1,25 @@
+services:
+  backend:
+    build: ./backend
+    ports:
+      - "8000:8000"
+    volumes:
+      - ./backend/data:/app/data
+      - ./backend:/app
+    environment:
+      - DATA_DIR=/app/data
+      - PYTHONDONTWRITEBYTECODE=1
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
+    command: uvicorn main:app --host 0.0.0.0 --port 8000 --reload
+
+  frontend:
+    build: ./frontend
+    ports:
+      - "5173:5173"
+    volumes:
+      - ./frontend/src:/app/src
+      - ./frontend/index.html:/app/index.html
+    depends_on:
+      - backend
+    command: npm run dev -- --host 0.0.0.0
@@ -0,0 +1,10 @@
+FROM node:20-alpine
+
+WORKDIR /app
+
+COPY package*.json ./
+RUN npm install
+
+COPY . .
+
+EXPOSE 5173
@@ -0,0 +1,12 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <title>Document Scanner</title>
+  </head>
+  <body>
+    <div id="app"></div>
+    <script type="module" src="/src/main.js"></script>
+  </body>
+</html>
@@ -0,0 +1,22 @@
+{
+  "name": "document-scanner-frontend",
+  "version": "1.0.0",
+  "type": "module",
+  "scripts": {
+    "dev": "vite",
+    "build": "vite build",
+    "preview": "vite preview"
+  },
+  "dependencies": {
+    "vue": "^3.4.0",
+    "vue-router": "^4.3.0",
+    "pinia": "^2.1.0"
+  },
+  "devDependencies": {
+    "@vitejs/plugin-vue": "^5.0.0",
+    "vite": "^5.2.0",
+    "tailwindcss": "^3.4.0",
+    "postcss": "^8.4.0",
+    "autoprefixer": "^10.4.0"
+  }
+}
@@ -0,0 +1,6 @@
+export default {
+  plugins: {
+    tailwindcss: {},
+    autoprefixer: {},
+  },
+}
@@ -0,0 +1,17 @@
+<template>
+  <div class="flex h-screen overflow-hidden">
+    <AppSidebar />
+    <main class="flex-1 overflow-y-auto">
+      <router-view />
+    </main>
+  </div>
+</template>
+
+<script setup>
+import AppSidebar from './components/layout/AppSidebar.vue'
+import { useTopicsStore } from './stores/topics.js'
+import { onMounted } from 'vue'
+
+const topicsStore = useTopicsStore()
+onMounted(() => topicsStore.fetchTopics())
+</script>
@@ -0,0 +1,105 @@
+/**
+ * API client using native Fetch API.
+ * All requests go to /api (proxied to backend by Vite in dev, or nginx in prod).
+ */
+
+async function request(path, options = {}) {
+  const res = await fetch(path, options)
+  if (!res.ok) {
+    let msg = `HTTP ${res.status}`
+    try { msg = (await res.json()).detail || msg } catch {}
+    throw new Error(msg)
+  }
+  return res.json()
+}
+
+// ── Documents ────────────────────────────────────────────────────────────────
+
+export function uploadDocument(file, autoClassify = true) {
+  const form = new FormData()
+  form.append('file', file)
+  form.append('auto_classify', autoClassify ? 'true' : 'false')
+  return request('/api/documents/upload', { method: 'POST', body: form })
+}
+
+export function listDocuments({ topic, page = 1, perPage = 20 } = {}) {
+  const params = new URLSearchParams({ page, per_page: perPage })
+  if (topic) params.set('topic', topic)
+  return request(`/api/documents?${params}`)
+}
+
+export function getDocument(id) {
+  return request(`/api/documents/${id}`)
+}
+
+export function deleteDocument(id) {
+  return request(`/api/documents/${id}`, { method: 'DELETE' })
+}
+
+export function classifyDocument(id, topics = null) {
+  return request(`/api/documents/${id}/classify`, {
+    method: 'POST',
+    headers: { 'Content-Type': 'application/json' },
+    body: JSON.stringify(topics ? { topics } : {}),
+  })
+}
+
+// ── Topics ───────────────────────────────────────────────────────────────────
+
+export function listTopics() {
+  return request('/api/topics')
+}
+
+export function createTopic({ name, description = '', color = '#6366f1' }) {
+  return request('/api/topics', {
+    method: 'POST',
+    headers: { 'Content-Type': 'application/json' },
+    body: JSON.stringify({ name, description, color }),
+  })
+}
+
+export function updateTopic(id, patch) {
+  return request(`/api/topics/${id}`, {
+    method: 'PATCH',
+    headers: { 'Content-Type': 'application/json' },
+    body: JSON.stringify(patch),
+  })
+}
+
+export function deleteTopic(id) {
+  return request(`/api/topics/${id}`, { method: 'DELETE' })
+}
+
+export function suggestTopics(documentId) {
+  return request('/api/topics/suggest', {
+    method: 'POST',
+    headers: { 'Content-Type': 'application/json' },
+    body: JSON.stringify({ document_id: documentId }),
+  })
+}
+
+// ── Settings ─────────────────────────────────────────────────────────────────
+
+export function getSettings() {
+  return request('/api/settings')
+}
+
+export function patchSettings(patch) {
+  return request('/api/settings', {
+    method: 'PATCH',
+    headers: { 'Content-Type': 'application/json' },
+    body: JSON.stringify(patch),
+  })
+}
+
+export function testProvider(provider) {
+  return request('/api/settings/test-provider', {
+    method: 'POST',
+    headers: { 'Content-Type': 'application/json' },
+    body: JSON.stringify({ provider }),
+  })
+}
+
+export function getDefaultPrompt() {
+  return request('/api/settings/default-prompt')
+}
@@ -0,0 +1,59 @@
+<template>
+  <div
+    class="bg-white border border-gray-200 rounded-xl p-4 hover:border-indigo-300 hover:shadow-sm transition-all cursor-pointer"
+    @click="$router.push(`/document/${doc.id}`)"
+  >
+    <div class="flex items-start gap-3">
+      <!-- Icon -->
+      <div class="w-9 h-9 rounded-lg bg-indigo-50 flex items-center justify-center shrink-0 mt-0.5">
+        <svg class="w-5 h-5 text-indigo-500" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+          <path stroke-linecap="round" stroke-linejoin="round" stroke-width="2"
+            d="M9 12h6m-6 4h6m2 5H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z" />
+        </svg>
+      </div>
+
+      <div class="flex-1 min-w-0">
+        <p class="font-medium text-gray-900 text-sm truncate">{{ doc.original_name }}</p>
+        <p class="text-xs text-gray-400 mt-0.5">{{ formatDate(doc.created_at) }} · {{ formatSize(doc.size_bytes) }}</p>
+
+        <!-- Topics -->
+        <div class="flex flex-wrap gap-1 mt-2">
+          <TopicBadge
+            v-for="topicName in doc.topics"
+            :key="topicName"
+            :name="topicName"
+            :color="topicColor(topicName)"
+          />
+          <span v-if="!doc.topics?.length" class="text-xs text-gray-300 italic">unclassified</span>
+        </div>
+      </div>
+    </div>
+  </div>
+</template>
+
+<script setup>
+import { useTopicsStore } from '../../stores/topics.js'
+import TopicBadge from '../topics/TopicBadge.vue'
+
+const props = defineProps({
+  doc: Object,
+})
+
+const topicsStore = useTopicsStore()
+
+function topicColor(name) {
+  return topicsStore.topics.find(t => t.name === name)?.color ?? '#6366f1'
+}
+
+function formatDate(iso) {
+  if (!iso) return ''
+  return new Date(iso).toLocaleDateString(undefined, { month: 'short', day: 'numeric', year: 'numeric' })
+}
+
+function formatSize(bytes) {
+  if (!bytes) return ''
+  if (bytes < 1024) return bytes + ' B'
+  if (bytes < 1024 * 1024) return (bytes / 1024).toFixed(1) + ' KB'
+  return (bytes / (1024 * 1024)).toFixed(1) + ' MB'
+}
+</script>
@@ -0,0 +1,87 @@
+<template>
+  <aside class="w-64 bg-white border-r border-gray-200 flex flex-col h-full shrink-0">
+    <!-- Logo -->
+    <div class="px-6 py-5 border-b border-gray-100">
+      <h1 class="text-lg font-bold text-indigo-600 tracking-tight">DocScanner</h1>
+      <p class="text-xs text-gray-400 mt-0.5">AI Document Classifier</p>
+    </div>
+
+    <!-- Nav -->
+    <nav class="flex-1 px-3 py-4 overflow-y-auto">
+      <router-link
+        to="/"
+        class="nav-link"
+        :class="{ 'nav-link-active': $route.path === '/' }"
+      >
+        <svg class="w-4 h-4 mr-2 shrink-0" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+          <path stroke-linecap="round" stroke-linejoin="round" stroke-width="2"
+            d="M3 12l2-2m0 0l7-7 7 7M5 10v10a1 1 0 001 1h3m10-11l2 2m-2-2v10a1 1 0 01-1 1h-3m-6 0a1 1 0 001-1v-4a1 1 0 011-1h2a1 1 0 011 1v4a1 1 0 001 1m-6 0h6" />
+        </svg>
+        Home
+      </router-link>
+
+      <router-link
+        to="/topics"
+        class="nav-link"
+        :class="{ 'nav-link-active': $route.path.startsWith('/topics') }"
+      >
+        <svg class="w-4 h-4 mr-2 shrink-0" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+          <path stroke-linecap="round" stroke-linejoin="round" stroke-width="2"
+            d="M7 7h.01M7 3h5c.512 0 1.024.195 1.414.586l7 7a2 2 0 010 2.828l-7 7a2 2 0 01-2.828 0l-7-7A1.994 1.994 0 013 12V7a4 4 0 014-4z" />
+        </svg>
+        All Topics
+      </router-link>
+
+      <!-- Topics list -->
+      <div class="mt-3">
+        <p class="px-3 text-xs font-semibold text-gray-400 uppercase tracking-wider mb-1">Topics</p>
+        <div v-if="topicsStore.loading" class="px-3 py-1 text-xs text-gray-400">Loading…</div>
+        <div v-else-if="topicsStore.topics.length === 0" class="px-3 py-1 text-xs text-gray-400">No topics yet</div>
+        <router-link
+          v-for="topic in topicsStore.topics"
+          :key="topic.id"
+          :to="`/topics/${encodeURIComponent(topic.name)}`"
+          class="nav-link text-sm"
+          :class="{ 'nav-link-active': $route.params.name === topic.name }"
+        >
+          <span
+            class="w-2.5 h-2.5 rounded-full mr-2 shrink-0"
+            :style="{ backgroundColor: topic.color }"
+          ></span>
+          <span class="truncate">{{ topic.name }}</span>
+          <span class="ml-auto text-xs text-gray-400">{{ topic.doc_count }}</span>
+        </router-link>
+      </div>
+    </nav>
+
+    <!-- Settings link -->
+    <div class="px-3 py-4 border-t border-gray-100">
+      <router-link
+        to="/settings"
+        class="nav-link"
+        :class="{ 'nav-link-active': $route.path === '/settings' }"
+      >
+        <svg class="w-4 h-4 mr-2 shrink-0" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+          <path stroke-linecap="round" stroke-linejoin="round" stroke-width="2"
+            d="M10.325 4.317c.426-1.756 2.924-1.756 3.35 0a1.724 1.724 0 002.573 1.066c1.543-.94 3.31.826 2.37 2.37a1.724 1.724 0 001.065 2.572c1.756.426 1.756 2.924 0 3.35a1.724 1.724 0 00-1.066 2.573c.94 1.543-.826 3.31-2.37 2.37a1.724 1.724 0 00-2.572 1.065c-.426 1.756-2.924 1.756-3.35 0a1.724 1.724 0 00-2.573-1.066c-1.543.94-3.31-.826-2.37-2.37a1.724 1.724 0 00-1.065-2.572c-1.756-.426-1.756-2.924 0-3.35a1.724 1.724 0 001.066-2.573c-.94-1.543.826-3.31 2.37-2.37.996.608 2.296.07 2.572-1.065z" />
+          <path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M15 12a3 3 0 11-6 0 3 3 0 016 0z" />
+        </svg>
+        Settings
+      </router-link>
+    </div>
+  </aside>
+</template>
+
+<script setup>
+import { useTopicsStore } from '../../stores/topics.js'
+const topicsStore = useTopicsStore()
+</script>
+
+<style scoped>
+.nav-link {
+  @apply flex items-center px-3 py-2 rounded-lg text-gray-600 hover:bg-gray-100 hover:text-gray-900 transition-colors text-sm font-medium;
+}
+.nav-link-active {
+  @apply bg-indigo-50 text-indigo-700;
+}
+</style>
@@ -0,0 +1,15 @@
+<template>
+  <span
+    class="inline-flex items-center px-2 py-0.5 rounded-full text-xs font-medium"
+    :style="{ backgroundColor: color + '22', color }"
+  >
+    {{ name }}
+  </span>
+</template>
+
+<script setup>
+defineProps({
+  name: String,
+  color: { type: String, default: '#6366f1' },
+})
+</script>
@@ -0,0 +1,124 @@
+<template>
+  <div>
+    <!-- Add form -->
+    <form @submit.prevent="submit" class="flex gap-2 mb-6">
+      <input
+        v-model="form.name"
+        type="text"
+        placeholder="New topic name…"
+        class="flex-1 border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
+        required
+      />
+      <input
+        v-model="form.color"
+        type="color"
+        class="w-10 h-10 rounded-lg border border-gray-300 cursor-pointer p-0.5"
+        title="Pick color"
+      />
+      <button
+        type="submit"
+        class="px-4 py-2 bg-indigo-600 text-white rounded-lg text-sm font-medium hover:bg-indigo-700 transition-colors"
+        :disabled="saving"
+      >
+        {{ saving ? 'Adding…' : 'Add' }}
+      </button>
+    </form>
+
+    <!-- Error -->
+    <p v-if="error" class="text-red-500 text-sm mb-4">{{ error }}</p>
+
+    <!-- Topic list -->
+    <div class="space-y-2">
+      <div
+        v-for="topic in topicsStore.topics"
+        :key="topic.id"
+        class="flex items-center gap-3 bg-white border border-gray-200 rounded-lg px-4 py-3"
+      >
+        <span
+          class="w-3 h-3 rounded-full shrink-0"
+          :style="{ backgroundColor: topic.color }"
+        ></span>
+
+        <div v-if="editing === topic.id" class="flex-1 flex gap-2">
+          <input
+            v-model="editForm.name"
+            class="flex-1 border border-gray-300 rounded px-2 py-1 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
+          />
+          <input v-model="editForm.color" type="color" class="w-8 h-8 rounded border border-gray-300 p-0.5" />
+          <input
+            v-model="editForm.description"
+            placeholder="Description"
+            class="flex-1 border border-gray-300 rounded px-2 py-1 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
+          />
+          <button @click="saveEdit(topic.id)" class="text-xs text-indigo-600 font-medium">Save</button>
+          <button @click="editing = null" class="text-xs text-gray-400">Cancel</button>
+        </div>
+
+        <div v-else class="flex-1 min-w-0">
+          <div class="flex items-center gap-2">
+            <span class="font-medium text-gray-800 text-sm">{{ topic.name }}</span>
+            <span class="text-xs text-gray-400">({{ topic.doc_count }} docs)</span>
+          </div>
+          <p v-if="topic.description" class="text-xs text-gray-500 mt-0.5">{{ topic.description }}</p>
+        </div>
+
+        <div class="flex gap-2 shrink-0">
+          <button @click="startEdit(topic)" class="text-xs text-gray-500 hover:text-indigo-600">Edit</button>
+          <button @click="remove(topic)" class="text-xs text-gray-500 hover:text-red-500">Delete</button>
+        </div>
+      </div>
+
+      <div v-if="!topicsStore.topics.length" class="text-center py-8 text-gray-400 text-sm">
+        No topics yet. Add one above.
+      </div>
+    </div>
+  </div>
+</template>
+
+<script setup>
+import { ref, reactive } from 'vue'
+import { useTopicsStore } from '../../stores/topics.js'
+
+const topicsStore = useTopicsStore()
+const saving = ref(false)
+const error = ref(null)
+const editing = ref(null)
+
+const form = reactive({ name: '', color: '#6366f1' })
+const editForm = reactive({ name: '', description: '', color: '' })
+
+async function submit() {
+  saving.value = true
+  error.value = null
+  try {
+    await topicsStore.addTopic({ name: form.name, color: form.color })
+    form.name = ''
+    form.color = '#6366f1'
+  } catch (e) {
+    error.value = e.message
+  } finally {
+    saving.value = false
+  }
+}
+
+function startEdit(topic) {
+  editing.value = topic.id
+  editForm.name = topic.name
+  editForm.description = topic.description || ''
+  editForm.color = topic.color
+}
+
+async function saveEdit(id) {
+  await topicsStore.editTopic(id, {
+    name: editForm.name,
+    description: editForm.description,
+    color: editForm.color,
+  })
+  editing.value = null
+}
+
+async function remove(topic) {
+  if (!confirm(`Delete topic "${topic.name}"? It will be removed from all documents.`)) return
+  await topicsStore.removeTopic(topic.id)
+}
+</script>
@@ -0,0 +1,62 @@
+<template>
+  <div
+    class="relative border-2 border-dashed rounded-xl p-10 text-center transition-colors"
+    :class="dragging
+      ? 'border-indigo-400 bg-indigo-50'
+      : 'border-gray-300 bg-white hover:border-indigo-300 hover:bg-gray-50'"
+    @dragover.prevent="dragging = true"
+    @dragleave.prevent="dragging = false"
+    @drop.prevent="onDrop"
+    @click="triggerInput"
+  >
+    <input
+      ref="inputRef"
+      type="file"
+      class="hidden"
+      multiple
+      accept=".pdf,.docx,.doc,.txt,.md,.png,.jpg,.jpeg,.tiff,.webp"
+      @change="onFileChange"
+    />
+
+    <div class="flex flex-col items-center gap-3">
+      <svg class="w-12 h-12 text-gray-300" fill="none" stroke="currentColor" viewBox="0 0 24 24">
+        <path stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5"
+          d="M7 16a4 4 0 01-.88-7.903A5 5 0 1115.9 6L16 6a5 5 0 011 9.9M15 13l-3-3m0 0l-3 3m3-3v12" />
+      </svg>
+      <div>
+        <p class="text-sm font-medium text-gray-700">Drop files here or <span class="text-indigo-600 underline cursor-pointer">browse</span></p>
+        <p class="text-xs text-gray-400 mt-1">PDF, DOCX, TXT, MD, PNG, JPG supported</p>
+      </div>
+
+      <label class="flex items-center gap-2 mt-2 cursor-pointer" @click.stop>
+        <input type="checkbox" v-model="autoClassify" class="rounded border-gray-300 text-indigo-600" />
+        <span class="text-sm text-gray-600">Auto-classify with AI after upload</span>
+      </label>
+    </div>
+  </div>
+</template>
+
+<script setup>
+import { ref } from 'vue'
+
+const emit = defineEmits(['files-selected'])
+const dragging = ref(false)
+const inputRef = ref(null)
+const autoClassify = ref(true)
+
+function triggerInput() {
+  inputRef.value?.click()
+}
+
+function onDrop(e) {
+  dragging.value = false
+  const files = Array.from(e.dataTransfer?.files || [])
+  if (files.length) emit('files-selected', { files, autoClassify: autoClassify.value })
+}
+
+function onFileChange(e) {
+  const files = Array.from(e.target.files || [])
+  if (files.length) emit('files-selected', { files, autoClassify: autoClassify.value })
+  e.target.value = ''
+}
+</script>
@@ -0,0 +1,36 @@
+<template>
+  <div v-if="items.length" class="space-y-2 mt-4">
+    <div
+      v-for="item in items"
+      :key="item.name"
+      class="flex items-center gap-3 bg-white border border-gray-200 rounded-lg px-4 py-2.5"
+    >
+      <div class="flex-1 min-w-0">
+        <p class="text-sm font-medium text-gray-800 truncate">{{ item.name }}</p>
+        <p v-if="item.error" class="text-xs text-red-500 mt-0.5">{{ item.error }}</p>
+        <p v-else-if="item.done" class="text-xs text-green-600 mt-0.5">
+          Done{{ item.topics?.length ? ` — classified as: ${item.topics.join(', ')}` : ' — no topics assigned' }}
+        </p>
+        <p v-else class="text-xs text-gray-400 mt-0.5">Uploading…</p>
+      </div>
+      <div class="shrink-0">
+        <svg v-if="item.error" class="w-5 h-5 text-red-400" fill="currentColor" viewBox="0 0 20 20">
+          <path fill-rule="evenodd" d="M10 18a8 8 0 100-16 8 8 0 000 16zM8.707 7.293a1 1 0 00-1.414 1.414L8.586 10l-1.293 1.293a1 1 0 101.414 1.414L10 11.414l1.293 1.293a1 1 0 001.414-1.414L11.414 10l1.293-1.293a1 1 0 00-1.414-1.414L10 8.586 8.707 7.293z" clip-rule="evenodd"/>
+        </svg>
+        <svg v-else-if="item.done" class="w-5 h-5 text-green-500" fill="currentColor" viewBox="0 0 20 20">
+          <path fill-rule="evenodd" d="M10 18a8 8 0 100-16 8 8 0 000 16zm3.707-9.293a1 1 0 00-1.414-1.414L9 10.586 7.707 9.293a1 1 0 00-1.414 1.414l2 2a1 1 0 001.414 0l4-4z" clip-rule="evenodd"/>
+        </svg>
+        <svg v-else class="w-5 h-5 text-indigo-400 animate-spin" fill="none" viewBox="0 0 24 24">
+          <circle class="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" stroke-width="4"/>
+          <path class="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4z"/>
+        </svg>
+      </div>
+    </div>
+  </div>
+</template>
+
+<script setup>
+defineProps({
+  items: { type: Array, default: () => [] },
+})
+</script>
@@ -0,0 +1,10 @@
+import { createApp } from 'vue'
+import { createPinia } from 'pinia'
+import App from './App.vue'
+import router from './router/index.js'
+import './style.css'
+
+const app = createApp(App)
+app.use(createPinia())
+app.use(router)
+app.mount('#app')
@@ -0,0 +1,18 @@
+import { createRouter, createWebHistory } from 'vue-router'
+import HomeView from '../views/HomeView.vue'
+import TopicsView from '../views/TopicsView.vue'
+import DocumentView from '../views/DocumentView.vue'
+import SettingsView from '../views/SettingsView.vue'
+
+const routes = [
+  { path: '/', component: HomeView },
+  { path: '/topics', component: TopicsView },
+  { path: '/topics/:name', component: TopicsView },
+  { path: '/document/:id', component: DocumentView },
+  { path: '/settings', component: SettingsView },
+]
+
+export default createRouter({
+  history: createWebHistory(),
+  routes,
+})
@@ -0,0 +1,46 @@
+import { defineStore } from 'pinia'
+import { ref } from 'vue'
+import * as api from '../api/client.js'
+
+export const useDocumentsStore = defineStore('documents', () => {
+  const documents = ref([])
+  const total = ref(0)
+  const loading = ref(false)
+  const error = ref(null)
+
+  async function fetchDocuments({ topic, page = 1, perPage = 20 } = {}) {
+    loading.value = true
+    error.value = null
+    try {
+      const data = await api.listDocuments({ topic, page, perPage })
+      documents.value = data.items
+      total.value = data.total
+    } catch (e) {
+      error.value = e.message
+    } finally {
+      loading.value = false
+    }
+  }
+
+  async function upload(file, autoClassify = true) {
+    const doc = await api.uploadDocument(file, autoClassify)
+    documents.value.unshift(doc)
+    total.value++
+    return doc
+  }
+
+  async function remove(id) {
+    await api.deleteDocument(id)
+    documents.value = documents.value.filter(d => d.id !== id)
+    total.value--
+  }
+
+  async function reclassify(id, topics = null) {
+    const result = await api.classifyDocument(id, topics)
+    const idx = documents.value.findIndex(d => d.id === id)
+    if (idx !== -1) documents.value[idx].topics = result.topics
+    return result.topics
+  }
+
+  return { documents, total, loading, error, fetchDocuments, upload, remove, reclassify }
+})
@@ -0,0 +1,38 @@
+import { defineStore } from 'pinia'
+import { ref } from 'vue'
+import * as api from '../api/client.js'
+
+export const useSettingsStore = defineStore('settings', () => {
+  const settings = ref(null)
+  const loading = ref(false)
+  const error = ref(null)
+
+  async function fetchSettings() {
+    loading.value = true
+    error.value = null
+    try {
+      settings.value = await api.getSettings()
+    } catch (e) {
+      error.value = e.message
+    } finally {
+      loading.value = false
+    }
+  }
+
+  async function save(patch) {
+    const updated = await api.patchSettings(patch)
+    settings.value = updated
+    return updated
+  }
+
+  async function testConnection(provider) {
+    return api.testProvider(provider)
+  }
+
+  async function resetPrompt() {
+    const data = await api.getDefaultPrompt()
+    return data.system_prompt
+  }
+
+  return { settings, loading, error, fetchSettings, save, testConnection, resetPrompt }
+})
@@ -0,0 +1,42 @@
+import { defineStore } from 'pinia'
+import { ref } from 'vue'
+import * as api from '../api/client.js'
+
+export const useTopicsStore = defineStore('topics', () => {
+  const topics = ref([])
+  const loading = ref(false)
+  const error = ref(null)
+
+  async function fetchTopics() {
+    loading.value = true
+    error.value = null
+    try {
+      const data = await api.listTopics()
+      topics.value = data.topics
+    } catch (e) {
+      error.value = e.message
+    } finally {
+      loading.value = false
+    }
+  }
+
+  async function addTopic(payload) {
+    const topic = await api.createTopic(payload)
+    topics.value.push(topic)
+    return topic
+  }
+
+  async function editTopic(id, patch) {
+    const updated = await api.updateTopic(id, patch)
+    const idx = topics.value.findIndex(t => t.id === id)
+    if (idx !== -1) topics.value[idx] = updated
+    return updated
+  }
+
+  async function removeTopic(id) {
+    await api.deleteTopic(id)
+    topics.value = topics.value.filter(t => t.id !== id)
+  }
+
+  return { topics, loading, error, fetchTopics, addTopic, editTopic, removeTopic }
+})
@@ -0,0 +1,9 @@
+@tailwind base;
+@tailwind components;
+@tailwind utilities;
+
+@layer base {
+  body {
+    @apply bg-gray-50 text-gray-900;
+  }
+}
@@ -0,0 +1,184 @@
+<template>
+  <div class="p-8 max-w-4xl mx-auto">
+    <!-- Back -->
+    <button @click="$router.back()" class="text-sm text-indigo-600 hover:underline mb-6 flex items-center gap-1">
+      ← Back
+    </button>
+
+    <div v-if="loading" class="text-gray-400 text-sm">Loading…</div>
+    <div v-else-if="!doc" class="text-gray-400 text-sm">Document not found.</div>
+
+    <template v-else>
+      <!-- Header -->
+      <div class="flex items-start justify-between gap-4 mb-6">
+        <div>
+          <h2 class="text-2xl font-bold text-gray-900 break-all">{{ doc.original_name }}</h2>
+          <p class="text-sm text-gray-400 mt-1">
+            Uploaded {{ formatDate(doc.created_at) }} · {{ formatSize(doc.size_bytes) }} · {{ doc.mime_type }}
+          </p>
+        </div>
+        <button
+          @click="confirmDelete"
+          class="text-sm text-red-500 hover:text-red-700 shrink-0"
+        >Delete</button>
+      </div>
+
+      <!-- Topics -->
+      <div class="bg-white border border-gray-200 rounded-xl p-5 mb-5">
+        <div class="flex items-center justify-between mb-3">
+          <h3 class="font-semibold text-gray-800">Topics</h3>
+          <div class="flex gap-2">
+            <button
+              @click="reclassify"
+              :disabled="classifying"
+              class="text-xs px-3 py-1.5 bg-indigo-600 text-white rounded-lg hover:bg-indigo-700 transition-colors disabled:opacity-50"
+            >
+              {{ classifying ? 'Classifying…' : 'Re-classify' }}
+            </button>
+            <button
+              @click="suggestTopics"
+              :disabled="suggesting"
+              class="text-xs px-3 py-1.5 border border-gray-300 text-gray-700 rounded-lg hover:bg-gray-50 transition-colors disabled:opacity-50"
+            >
+              {{ suggesting ? 'Suggesting…' : 'Suggest Topics' }}
+            </button>
+          </div>
+        </div>
+
+        <div class="flex flex-wrap gap-2">
+          <TopicBadge
+            v-for="name in doc.topics"
+            :key="name"
+            :name="name"
+            :color="topicColor(name)"
+          />
+          <span v-if="!doc.topics?.length" class="text-sm text-gray-400 italic">No topics assigned yet.</span>
+        </div>
+
+        <p v-if="classifyError" class="text-red-500 text-xs mt-2">{{ classifyError }}</p>
+
+        <!-- Suggestions modal inline -->
+        <div v-if="suggestions.length" class="mt-4 border-t border-gray-100 pt-4">
+          <p class="text-sm font-medium text-gray-700 mb-2">AI Suggestions — select to create:</p>
+          <div class="flex flex-wrap gap-2 mb-3">
+            <label
+              v-for="s in suggestions"
+              :key="s"
+              class="flex items-center gap-1.5 cursor-pointer text-sm"
+            >
+              <input type="checkbox" v-model="selectedSuggestions" :value="s" class="rounded border-gray-300 text-indigo-600" />
+              {{ s }}
+            </label>
+          </div>
+          <div class="flex gap-2">
+            <button
+              @click="createSelectedTopics"
+              :disabled="!selectedSuggestions.length"
+              class="text-xs px-3 py-1.5 bg-indigo-600 text-white rounded-lg hover:bg-indigo-700 disabled:opacity-50"
+            >
+              Create Selected
+            </button>
+            <button @click="suggestions = []; selectedSuggestions = []" class="text-xs text-gray-500 hover:text-gray-700">
+              Dismiss
+            </button>
+          </div>
+        </div>
+      </div>
+
+      <!-- Extracted text -->
+      <div class="bg-white border border-gray-200 rounded-xl p-5">
+        <h3 class="font-semibold text-gray-800 mb-3">Extracted Text</h3>
+        <pre class="text-xs text-gray-600 whitespace-pre-wrap font-mono bg-gray-50 rounded-lg p-4 max-h-96 overflow-y-auto">{{ doc.extracted_text || '(no text extracted)' }}</pre>
+      </div>
+    </template>
+  </div>
+</template>
+
+<script setup>
+import { ref, onMounted } from 'vue'
+import { useRoute, useRouter } from 'vue-router'
+import TopicBadge from '../components/topics/TopicBadge.vue'
+import { useDocumentsStore } from '../stores/documents.js'
+import { useTopicsStore } from '../stores/topics.js'
+import * as api from '../api/client.js'
+
+const route = useRoute()
+const router = useRouter()
+const docsStore = useDocumentsStore()
+const topicsStore = useTopicsStore()
+
+const doc = ref(null)
+const loading = ref(true)
+const classifying = ref(false)
+const suggesting = ref(false)
+const classifyError = ref(null)
+const suggestions = ref([])
+const selectedSuggestions = ref([])
+
+onMounted(async () => {
+  try {
+    doc.value = await api.getDocument(route.params.id)
+  } finally {
+    loading.value = false
+  }
+})
+
+function topicColor(name) {
+  return topicsStore.topics.find(t => t.name === name)?.color ?? '#6366f1'
+}
+
+async function reclassify() {
+  classifying.value = true
+  classifyError.value = null
+  try {
+    const result = await api.classifyDocument(doc.value.id)
+    doc.value.topics = result.topics
+    await topicsStore.fetchTopics()
+  } catch (e) {
+    classifyError.value = e.message
+  } finally {
+    classifying.value = false
+  }
+}
+
+async function suggestTopics() {
+  suggesting.value = true
+  try {
+    const result = await api.suggestTopics(doc.value.id)
+    suggestions.value = result.suggested
+    selectedSuggestions.value = []
+  } catch (e) {
+    classifyError.value = e.message
+  } finally {
+    suggesting.value = false
+  }
+}
+
+async function createSelectedTopics() {
+  for (const name of selectedSuggestions.value) {
+    await topicsStore.addTopic({ name })
+  }
+  suggestions.value = []
+  selectedSuggestions.value = []
+  // Re-classify now that topics exist
+  await reclassify()
+}
+
+async function confirmDelete() {
+  if (!confirm(`Delete "${doc.value.original_name}"?`)) return
+  await api.deleteDocument(doc.value.id)
+  router.push('/')
+}
+
+function formatDate(iso) {
+  if (!iso) return ''
+  return new Date(iso).toLocaleDateString(undefined, { month: 'short', day: 'numeric', year: 'numeric' })
+}
+
+function formatSize(bytes) {
+  if (!bytes) return ''
+  if (bytes < 1024) return bytes + ' B'
+  if (bytes < 1024 * 1024) return (bytes / 1024).toFixed(1) + ' KB'
+  return (bytes / (1024 * 1024)).toFixed(1) + ' MB'
+}
+</script>
@@ -0,0 +1,63 @@
+<template>
+  <div class="p-8 max-w-4xl mx-auto">
+    <h2 class="text-2xl font-bold text-gray-900 mb-1">Upload Documents</h2>
+    <p class="text-gray-500 text-sm mb-6">Drop files to extract text and classify them with AI.</p>
+
+    <DropZone @files-selected="onFilesSelected" />
+    <UploadProgress :items="uploadQueue" />
+
+    <!-- Recent documents -->
+    <div class="mt-10">
+      <div class="flex items-center justify-between mb-4">
+        <h3 class="text-lg font-semibold text-gray-800">Recent Documents</h3>
+        <span class="text-sm text-gray-400">{{ docsStore.total }} total</span>
+      </div>
+
+      <div v-if="docsStore.loading" class="text-sm text-gray-400">Loading…</div>
+      <div v-else-if="docsStore.documents.length === 0" class="text-center py-12 text-gray-400">
+        <p class="text-sm">No documents yet. Upload one above.</p>
+      </div>
+      <div v-else class="grid gap-3">
+        <DocumentCard v-for="doc in docsStore.documents" :key="doc.id" :doc="doc" />
+      </div>
+    </div>
+  </div>
+</template>
+
+<script setup>
+import { ref, onMounted } from 'vue'
+import DropZone from '../components/upload/DropZone.vue'
+import UploadProgress from '../components/upload/UploadProgress.vue'
+import DocumentCard from '../components/documents/DocumentCard.vue'
+import { useDocumentsStore } from '../stores/documents.js'
+import { useTopicsStore } from '../stores/topics.js'
+
+const docsStore = useDocumentsStore()
+const topicsStore = useTopicsStore()
+const uploadQueue = ref([])
+
+onMounted(() => docsStore.fetchDocuments())
+
+async function onFilesSelected({ files, autoClassify }) {
+  // Build queue items
+  const items = files.map(f => ({ name: f.name, done: false, error: null, topics: null }))
+  uploadQueue.value = [...items, ...uploadQueue.value]
+
+  for (const [i, file] of files.entries()) {
+    try {
+      const doc = await docsStore.upload(file, autoClassify)
+      const item = uploadQueue.value.find(q => q.name === file.name && !q.done && !q.error)
+      if (item) {
+        item.done = true
+        item.topics = doc.topics
+      }
+    } catch (e) {
+      const item = uploadQueue.value.find(q => q.name === file.name && !q.done && !q.error)
+      if (item) item.error = e.message
+    }
+  }
+
+  // Refresh topics (new ones may have been created)
+  await topicsStore.fetchTopics()
+}
+</script>
@@ -0,0 +1,223 @@
+<template>
+  <div class="p-8 max-w-3xl mx-auto">
+    <h2 class="text-2xl font-bold text-gray-900 mb-1">Settings</h2>
+    <p class="text-gray-500 text-sm mb-8">Configure AI provider and the system prompt.</p>
+
+    <div v-if="settingsStore.loading" class="text-gray-400 text-sm">Loading…</div>
+    <div v-else-if="!settingsStore.settings" class="text-red-500 text-sm">Failed to load settings.</div>
+
+    <template v-else>
+      <!-- AI Provider -->
+      <section class="bg-white border border-gray-200 rounded-xl p-6 mb-5">
+        <h3 class="font-semibold text-gray-800 mb-4">AI Provider</h3>
+
+        <div class="flex flex-wrap gap-2 mb-6">
+          <button
+            v-for="prov in providers"
+            :key="prov.id"
+            @click="activeProvider = prov.id"
+            class="px-4 py-2 rounded-lg text-sm font-medium border transition-colors"
+            :class="activeProvider === prov.id
+              ? 'bg-indigo-600 text-white border-indigo-600'
+              : 'border-gray-300 text-gray-600 hover:bg-gray-50'"
+          >
+            {{ prov.label }}
+          </button>
+        </div>
+
+        <!-- Anthropic config -->
+        <div v-if="activeProvider === 'anthropic'" class="space-y-3">
+          <label class="block text-sm font-medium text-gray-700">API Key</label>
+          <input
+            v-model="providerCfg.anthropic.api_key"
+            type="password"
+            placeholder="sk-ant-…"
+            class="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
+          />
+          <label class="block text-sm font-medium text-gray-700 mt-3">Model</label>
+          <input
+            v-model="providerCfg.anthropic.model"
+            class="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
+          />
+        </div>
+
+        <!-- OpenAI config -->
+        <div v-else-if="activeProvider === 'openai'" class="space-y-3">
+          <label class="block text-sm font-medium text-gray-700">API Key</label>
+          <input
+            v-model="providerCfg.openai.api_key"
+            type="password"
+            placeholder="sk-…"
+            class="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
+          />
+          <label class="block text-sm font-medium text-gray-700 mt-3">Model</label>
+          <input
+            v-model="providerCfg.openai.model"
+            class="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
+          />
+          <label class="block text-sm font-medium text-gray-700 mt-3">Base URL (optional)</label>
+          <input
+            v-model="providerCfg.openai.base_url"
+            placeholder="https://api.openai.com/v1"
+            class="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
+          />
+        </div>
+
+        <!-- Ollama config -->
+        <div v-else-if="activeProvider === 'ollama'" class="space-y-3">
+          <label class="block text-sm font-medium text-gray-700">Base URL</label>
+          <input
+            v-model="providerCfg.ollama.base_url"
+            class="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
+          />
+          <label class="block text-sm font-medium text-gray-700 mt-3">Model</label>
+          <input
+            v-model="providerCfg.ollama.model"
+            class="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
+          />
+          <p class="text-xs text-gray-400 mt-1">
+            Ollama must be started with <code class="bg-gray-100 px-1 rounded">OLLAMA_HOST=0.0.0.0 ollama serve</code>
+          </p>
+        </div>
+
+        <!-- LM Studio config -->
+        <div v-else-if="activeProvider === 'lmstudio'" class="space-y-3">
+          <label class="block text-sm font-medium text-gray-700">Base URL</label>
+          <input
+            v-model="providerCfg.lmstudio.base_url"
+            class="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
+          />
+          <label class="block text-sm font-medium text-gray-700 mt-3">Model</label>
+          <input
+            v-model="providerCfg.lmstudio.model"
+            class="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
+          />
+          <p class="text-xs text-gray-400 mt-1">
+            LM Studio server must be bound to <code class="bg-gray-100 px-1 rounded">0.0.0.0</code> in LM Studio settings.
+          </p>
+        </div>
+
+        <!-- Test connection -->
+        <div class="flex items-center gap-3 mt-5">
+          <button
+            @click="testConn"
+            :disabled="testing"
+            class="text-sm px-4 py-2 border border-gray-300 rounded-lg hover:bg-gray-50 transition-colors disabled:opacity-50"
+          >
+            {{ testing ? 'Testing…' : 'Test Connection' }}
+          </button>
+          <span v-if="testResult" :class="testResult.ok ? 'text-green-600' : 'text-red-500'" class="text-sm">
+            {{ testResult.ok ? '✓' : '✗' }} {{ testResult.message }}
+            <span v-if="testResult.ok && testResult.latency_ms" class="text-gray-400">({{ testResult.latency_ms }}ms)</span>
+          </span>
+        </div>
+      </section>
+
+      <!-- System Prompt -->
+      <section class="bg-white border border-gray-200 rounded-xl p-6 mb-5">
+        <div class="flex items-center justify-between mb-3">
+          <h3 class="font-semibold text-gray-800">System Prompt</h3>
+          <button @click="resetPrompt" class="text-xs text-indigo-600 hover:underline">Reset to default</button>
+        </div>
+        <textarea
+          v-model="systemPrompt"
+          rows="8"
+          class="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm font-mono focus:outline-none focus:ring-2 focus:ring-indigo-400 resize-y"
+        ></textarea>
+      </section>
+
+      <!-- Save -->
+      <div class="flex items-center gap-3">
+        <button
+          @click="save"
+          :disabled="saving"
+          class="px-6 py-2.5 bg-indigo-600 text-white rounded-lg text-sm font-medium hover:bg-indigo-700 transition-colors disabled:opacity-50"
+        >
+          {{ saving ? 'Saving…' : 'Save Settings' }}
+        </button>
+        <span v-if="saveMsg" :class="saveError ? 'text-red-500' : 'text-green-600'" class="text-sm">
+          {{ saveMsg }}
+        </span>
+      </div>
+    </template>
+  </div>
+</template>
+
+<script setup>
+import { ref, reactive, watch, onMounted } from 'vue'
+import { useSettingsStore } from '../stores/settings.js'
+
+const settingsStore = useSettingsStore()
+const saving = ref(false)
+const testing = ref(false)
+const testResult = ref(null)
+const saveMsg = ref('')
+const saveError = ref(false)
+
+const providers = [
+  { id: 'lmstudio', label: 'LM Studio' },
+  { id: 'ollama', label: 'Ollama' },
+  { id: 'openai', label: 'OpenAI' },
+  { id: 'anthropic', label: 'Anthropic' },
+]
+
+const activeProvider = ref('lmstudio')
+const systemPrompt = ref('')
+const providerCfg = reactive({
+  anthropic: { api_key: '', model: 'claude-sonnet-4-6' },
+  openai: { api_key: '', model: 'gpt-4o', base_url: '' },
+  ollama: { base_url: 'http://host.docker.internal:11434', model: 'llama3.2' },
+  lmstudio: { base_url: 'http://host.docker.internal:1234', model: 'gemma-4-e4b-it' },
+})
+
+onMounted(async () => {
+  await settingsStore.fetchSettings()
+  populateForm()
+})
+
+function populateForm() {
+  const s = settingsStore.settings
+  if (!s) return
+  activeProvider.value = s.active_provider
+  systemPrompt.value = s.system_prompt
+  for (const [k, v] of Object.entries(s.providers || {})) {
+    if (providerCfg[k]) Object.assign(providerCfg[k], v)
+  }
+}
+
+async function testConn() {
+  testing.value = true
+  testResult.value = null
+  try {
+    testResult.value = await settingsStore.testConnection(activeProvider.value)
+  } catch (e) {
+    testResult.value = { ok: false, message: e.message, latency_ms: 0 }
+  } finally {
+    testing.value = false
+  }
+}
+
+async function resetPrompt() {
+  systemPrompt.value = await settingsStore.resetPrompt()
+}
+
+async function save() {
+  saving.value = true
+  saveMsg.value = ''
+  saveError.value = false
+  try {
+    await settingsStore.save({
+      system_prompt: systemPrompt.value,
+      active_provider: activeProvider.value,
+      providers: providerCfg,
+    })
+    saveMsg.value = 'Settings saved.'
+  } catch (e) {
+    saveMsg.value = e.message
+    saveError.value = true
+  } finally {
+    saving.value = false
+    setTimeout(() => saveMsg.value = '', 3000)
+  }
+}
+</script>
@@ -0,0 +1,82 @@
+<template>
+  <div class="p-8 max-w-4xl mx-auto">
+    <!-- Header -->
+    <div class="flex items-center justify-between mb-6">
+      <div>
+        <h2 class="text-2xl font-bold text-gray-900">
+          {{ activeTopic ? activeTopic : 'All Topics' }}
+        </h2>
+        <p class="text-gray-500 text-sm mt-0.5">
+          {{ activeTopic ? `Documents classified under "${activeTopic}"` : 'Manage topics and browse documents by topic' }}
+        </p>
+      </div>
+      <button
+        v-if="activeTopic"
+        @click="$router.push('/topics')"
+        class="text-sm text-indigo-600 hover:underline"
+      >
+        ← All Topics
+      </button>
+    </div>
+
+    <!-- No filter: show topic manager + topic grid -->
+    <template v-if="!activeTopic">
+      <TopicManager />
+
+      <div class="mt-8">
+        <h3 class="text-lg font-semibold text-gray-800 mb-4">Browse by Topic</h3>
+        <div v-if="topicsStore.topics.length === 0" class="text-sm text-gray-400">No topics yet.</div>
+        <div v-else class="grid grid-cols-2 sm:grid-cols-3 gap-3">
+          <router-link
+            v-for="topic in topicsStore.topics"
+            :key="topic.id"
+            :to="`/topics/${encodeURIComponent(topic.name)}`"
+            class="bg-white border border-gray-200 rounded-xl p-4 hover:border-indigo-300 hover:shadow-sm transition-all"
+          >
+            <div class="flex items-center gap-2 mb-2">
+              <span class="w-3 h-3 rounded-full" :style="{ backgroundColor: topic.color }"></span>
+              <span class="font-medium text-gray-800 text-sm">{{ topic.name }}</span>
+            </div>
+            <p class="text-2xl font-bold text-gray-900">{{ topic.doc_count }}</p>
+            <p class="text-xs text-gray-400">document{{ topic.doc_count !== 1 ? 's' : '' }}</p>
+          </router-link>
+        </div>
+      </div>
+    </template>
+
+    <!-- Filtered by topic: document list -->
+    <template v-else>
+      <div v-if="docsStore.loading" class="text-sm text-gray-400">Loading…</div>
+      <div v-else-if="docsStore.documents.length === 0" class="text-center py-12 text-gray-400">
+        No documents under this topic yet.
+      </div>
+      <div v-else class="grid gap-3">
+        <DocumentCard v-for="doc in docsStore.documents" :key="doc.id" :doc="doc" />
+      </div>
+    </template>
+  </div>
+</template>
+
+<script setup>
+import { computed, watch, onMounted } from 'vue'
+import { useRoute } from 'vue-router'
+import TopicManager from '../components/topics/TopicManager.vue'
+import DocumentCard from '../components/documents/DocumentCard.vue'
+import { useTopicsStore } from '../stores/topics.js'
+import { useDocumentsStore } from '../stores/documents.js'
+
+const route = useRoute()
+const topicsStore = useTopicsStore()
+const docsStore = useDocumentsStore()
+
+const activeTopic = computed(() => route.params.name ? decodeURIComponent(route.params.name) : null)
+
+function loadDocs() {
+  if (activeTopic.value) {
+    docsStore.fetchDocuments({ topic: activeTopic.value })
+  }
+}
+
+onMounted(loadDocs)
+watch(activeTopic, loadDocs)
+</script>
@@ -0,0 +1,8 @@
+/** @type {import('tailwindcss').Config} */
+export default {
+  content: ['./index.html', './src/**/*.{vue,js}'],
+  theme: {
+    extend: {},
+  },
+  plugins: [],
+}
@@ -0,0 +1,16 @@
+import { defineConfig } from 'vite'
+import vue from '@vitejs/plugin-vue'
+
+export default defineConfig({
+  plugins: [vue()],
+  server: {
+    host: '0.0.0.0',
+    port: 5173,
+    proxy: {
+      '/api': {
+        target: 'http://backend:8000',
+        changeOrigin: true,
+      },
+    },
+  },
+})
				`@@ -0,0 +1 @@`
				`This is an invoice for professional consulting services rendered in April 2026. Total amount due: 5000 EUR.`
				`@@ -0,0 +1 @@`
				`This document is about accounting and financial reports.`