chore: initial commit — existing single-user document scanner codebase

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
curo1305
2026-05-22 08:53:28 +02:00
parent 6fed5ba531
commit 7a34807fa0
71 changed files with 16408 additions and 0 deletions
+6
View File
@@ -0,0 +1,6 @@
# Copy to .env and fill in as needed.
# Settings are primarily managed through the in-app Settings UI.
# These are NOT required — the app defaults to LM Studio with no API keys.
ANTHROPIC_API_KEY=
OPENAI_API_KEY=
+114
View File
@@ -0,0 +1,114 @@
# ARCHITECTURE — document-scanner
_Last updated: 2026-05-21_
## Summary
Document Scanner is a two-tier web application: a Vue 3 SPA communicates with a FastAPI backend via a Vite dev-proxy (or directly in production). The backend handles document ingestion, text extraction, AI-based classification, and flat-file persistence. AI provider selection is fully runtime-configurable via a provider pattern abstraction.
---
## System Overview
```
Browser (Vue 3 SPA)
│ HTTP/JSON + multipart
FastAPI (port 8000)
├── api/documents.py upload, list, get, delete, reclassify
├── api/topics.py CRUD for topic list
├── api/settings.py AI provider config + system prompt
├── services/
│ ├── extractor.py text extraction dispatch
│ ├── classifier.py orchestrates AI call + topic creation
│ └── storage.py flat-file JSON + filesystem persistence
└── ai/ provider abstraction layer
├── base.py AIProvider ABC + ClassificationResult
├── __init__.py get_provider() factory
├── anthropic_provider.py
├── openai_provider.py
├── ollama_provider.py (subclasses OpenAIProvider)
└── lmstudio_provider.py (subclasses OpenAIProvider)
External AI service (Anthropic API / OpenAI API /
Ollama / LM Studio — host.docker.internal)
```
---
## Request Flow — Document Upload + Classification
1. Frontend POSTs `multipart/form-data` to `POST /api/documents/upload`
2. `documents.py` saves the file to `data/uploads/`, calls `extractor.extract_text()`
3. Extracted text (truncated to 50,000 chars) is stored in `data/metadata/<id>.json`
4. If `auto_classify=true`, `classifier.classify_document()` is called:
a. Loads current settings from `data/settings.json` → calls `get_provider(settings)`
b. Passes document text + existing topics to `provider.classify()`
c. Any suggested new topics are created via `storage.add_topic()`
d. Document metadata is updated with assigned topics
5. Full document metadata JSON is returned to the frontend
---
## AI Provider Abstraction
- `AIProvider` (ABC in `ai/base.py`) defines three async methods:
- `classify(document_text, existing_topics, system_prompt) → ClassificationResult`
- `suggest_topics(document_text, system_prompt) → list[str]`
- `health_check() → bool`
- `get_provider(settings: dict)` factory in `ai/__init__.py` reads `settings["active_provider"]` and instantiates the correct class
- `OllamaProvider` and `LMStudioProvider` extend `OpenAIProvider` (both expose OpenAI-compatible endpoints)
- Provider is re-instantiated on every request (stateless; no connection pooling)
---
## Data Persistence
All state is stored on the local filesystem — no database:
| Store | Path | Format | Access |
|---|---|---|---|
| Uploaded files | `data/uploads/<id>.<ext>` | Original binary | Direct filesystem |
| Document metadata | `data/metadata/<id>.json` | JSON per document | `filelock` protected |
| Topic list | `data/topics.json` | `{"topics": [...]}` | `filelock` protected |
| Settings | `data/settings.json` | JSON object | `filelock` protected |
`filelock` is used to prevent concurrent write corruption on JSON files.
---
## Frontend Architecture
- Vue 3 SPA (Options API), Pinia stores, Vue Router 4
- Three Pinia stores (`documents`, `topics`, `settings`) act as the sole data access layer — components never call the API directly
- `src/api/client.js` is the single HTTP adapter (wraps `fetch`)
- Vite proxies `/api/*` to `http://localhost:8000` in dev mode
---
## Key Patterns
- **Provider Pattern** — AI backends are interchangeable at runtime via settings
- **Service Layer** — `extractor`, `classifier`, `storage` are pure Python modules; no FastAPI coupling
- **Pinia-as-Facade** — stores encapsulate all async API calls; views stay declarative
---
## Constraints & Notable Decisions
- All CORS origins allowed (`allow_origins=["*"]`) — suitable for local dev, not production
- No authentication or user model
- Single-worker assumption for file locking (does not scale to multiple uvicorn workers)
- AI provider re-instantiated per request (no connection reuse)
- Data directory is volume-mounted in Docker; no backup or migration strategy
---
## Gaps / Unknowns
- No API versioning strategy visible
- Frontend has no error boundary or global error handling component
- No pagination on document list endpoint (could be a scaling concern)
+87
View File
@@ -0,0 +1,87 @@
# CONCERNS — document-scanner
_Last updated: 2026-05-21_
## Summary
The codebase is a well-structured local-first prototype. The main concerns are security issues that matter if exposed beyond localhost (open CORS, no file validation, plain-text key storage), several blocking I/O calls in async handlers, and a handful of code duplication issues in the AI provider layer. Overall health is good for a local dev tool; requires hardening before any networked deployment.
---
## Concerns by Severity
### HIGH
**1. File type validation is defined but never enforced**
`ALLOWED_MIME_TYPES` is defined in `backend/api/documents.py` but the upload handler never checks it — any file type is accepted. An attacker could upload executable files or crafted archives.
**2. No file size limit on uploads**
The entire uploaded file is read before any cap is applied. A large file could exhaust memory or disk. No `MAX_UPLOAD_SIZE` check exists at the HTTP boundary.
**3. API keys stored in plain-text JSON**
`backend/data/settings.json` stores API keys in plaintext. The volume mount in `docker-compose.yml` (`./backend/data:/app/data`) means any process with Docker access can read them. Masking only applies to API responses, not to disk.
**4. CORS fully open**
`allow_origins=["*"]` in `main.py` means any website can make cross-origin requests to the API, including with credentials if ever added.
**5. Docker Compose mounts entire backend source as writable volume**
`./backend:/app` gives the container write access to the host source tree. A path traversal or code execution bug in the app could overwrite source files.
---
### MEDIUM
**6. Blocking I/O in async FastAPI handlers**
`storage.py` uses synchronous file reads/writes and `filelock` blocking calls inside `async def` endpoints. This blocks the uvicorn event loop during every request. Should use `asyncio.to_thread()` or `aiofiles` (which is already in requirements but unused).
**7. Topic rename does not cascade to documents**
Deleting a topic removes it from document metadata, but renaming is not implemented — there is no rename endpoint. Users have no way to rename a topic without losing document associations.
**8. `list_metadata` loads all documents before filtering**
`storage.list_metadata()` reads all metadata JSON files on every list request. No pagination at the storage layer — O(N) disk reads per page request as the document count grows.
**9. `topic_doc_counts()` scans all metadata on every topic request**
Every `GET /api/topics` call triggers a full scan of all metadata files to count documents per topic. Not cached; will degrade linearly.
**10. `MAX_AI_CHARS` duplicated across 3 files**
The character truncation limit for AI input is duplicated as a magic constant in multiple provider files. The provider-level truncation is effectively dead code since `extractor.py` already truncates to `MAX_STORED_CHARS` (50,000).
**11. `_parse_classification` / `_parse_suggestions` duplicated between providers**
`anthropic_provider.py` and `openai_provider.py` each define their own JSON parsing helpers for AI responses. `test_classifier.py` only imports from `openai_provider`, meaning the Anthropic variants are untested.
**12. `health_check()` makes real billed API calls**
The "Test Connection" UI action calls `provider.health_check()`, which makes a real API call to Anthropic/OpenAI — incurring cost and latency every time the user tests connectivity. Should use a cheaper probe (e.g., list models endpoint or a cached status).
---
### LOW
**13. `uvicorn --reload` hardcoded in docker-compose.yml**
Hot-reload is hardcoded in the production compose file. There is no separate `docker-compose.prod.yml` or build-arg to disable it.
**14. Unused `shutil` import in `storage.py`**
`import shutil` appears in `storage.py` but is never used.
**15. Topic IDs are 8-character UUID prefixes**
`str(uuid.uuid4())[:8]` generates IDs with ~4 billion combinations — low collision risk for personal use but not safe at scale or for security-sensitive identifiers.
**16. `classify_document` request body uses raw `dict`, not a Pydantic model**
The reclassify endpoint accepts an unvalidated `dict` body. Invalid input causes an unformatted 500 rather than a clean 422 validation error.
**17. No global frontend error handling**
There is no Vue error boundary or global `window.onerror` / `app.config.errorHandler`. Failed API calls in stores may surface as silent failures or unhandled promise rejections.
**18. No document download endpoint**
Uploaded files are stored in `data/uploads/` but there is no `GET /api/documents/:id/file` endpoint to retrieve the original binary. Files are effectively write-only through the UI.
**19. `aiofiles` in requirements but never used**
`aiofiles>=23.2` is listed in `requirements.txt` but no code imports it. The blocking I/O concern (item 6) should use it.
---
## Gaps / Unknowns
- Production deployment path is undefined (no nginx, no TLS, no auth)
- OCR language support for pytesseract is not configured (defaults to English only)
- `suggest_topics` method on all providers is untested — unclear if it is used in the current UI flow
- No backup or recovery strategy for `data/` volume
+94
View File
@@ -0,0 +1,94 @@
# CONVENTIONS — document-scanner
_Last updated: 2026-05-21_
## Summary
The codebase follows standard Python and Vue 3 conventions without heavy tooling enforcement. Backend uses async/await throughout with type hints on public interfaces. Frontend uses Vue Options API with Pinia stores as the data layer. No linter or formatter configuration is committed.
---
## Python Conventions (Backend)
### Naming
- Files: `snake_case.py`
- Classes: `PascalCase` (e.g., `AnthropicProvider`, `ClassificationResult`)
- Functions/variables: `snake_case`
- Constants: `UPPER_SNAKE_CASE` (e.g., `MAX_STORED_CHARS`, `DATA_DIR`)
- Private helpers: leading underscore (e.g., `_extract_pdf`, `_parse_classification`)
### Async
- All API endpoint functions are `async def`
- All `AIProvider` methods are `async def`
- `pytest-asyncio` with `asyncio_mode=auto` (set in `pytest.ini`)
### Type Hints
- Used on public function signatures in `ai/` layer and `services/`
- Dataclass used for `ClassificationResult` (`@dataclass` with `field(default_factory=...)`)
- Not used consistently in `api/` routers (rely on FastAPI/Pydantic implicit validation)
### Error Handling
- `extractor.py` wraps all extraction in `try/except Exception` and returns error strings (never raises)
- AI providers raise on hard failures; caller (`classifier.py`) is responsible for propagating
- No global exception handler registered in `main.py`
### Imports
- Standard library first, then third-party, then local — not enforced by isort
- Heavy library imports (`fitz`, `pytesseract`, `docx`) are deferred inside functions to avoid import-time cost when unused
### Module Docstrings
- Present on `extractor.py` and `test_classifier.py`; absent elsewhere
---
## JavaScript / Vue Conventions (Frontend)
### Naming
- Vue files: `PascalCase.vue` (e.g., `DocumentCard.vue`, `AppSidebar.vue`)
- Pinia stores: `camelCase` filename matching store ID (e.g., `documents.js``useDocumentsStore`)
- Views: `<Name>View.vue` suffix
- Components grouped by domain in subdirectories: `documents/`, `topics/`, `upload/`, `layout/`
### Vue Style
- Options API used throughout (not Composition API)
- Props defined with type and default; no `defineProps` (Options API syntax)
- `v-model`, `v-for`, `v-if` used directly in templates
### Pinia Pattern
- Each store encapsulates `state`, `getters`, and `actions`
- Actions call `src/api/client.js` — components never import `client.js` directly
- Stores are the single source of truth; views read from store state
### API Client
- `src/api/client.js` is the sole HTTP adapter
- All paths are prefixed `/api/` (proxied to backend in dev via Vite config)
### Styling
- Tailwind CSS utility classes used directly in templates
- No scoped `<style>` blocks observed in component list
- Global styles in `src/style.css`
---
## API Design Conventions (Backend)
- All endpoints prefixed `/api/` (set per router)
- JSON responses; multipart for file upload
- HTTP verbs follow REST: GET list, GET by ID, POST create, PUT/PATCH update, DELETE remove
- No versioning (`/api/v1/`) — flat namespace
---
## Configuration
- Runtime paths controlled entirely by `DATA_DIR` env var (defaults to `/app/data`)
- AI settings persisted in `data/settings.json` — no env var overrides at runtime for provider config (except `ANTHROPIC_API_KEY` / `OPENAI_API_KEY` noted in `.env.example`)
- No `.env` loading in backend code — env vars passed via Docker Compose `environment:` block
---
## Gaps / Unknowns
- No ESLint, Prettier, Black, or Ruff configuration committed
- No pre-commit hooks
- No consistent JSDoc or Python docstring coverage
+144
View File
@@ -0,0 +1,144 @@
# INTEGRATIONS — document-scanner
_Last updated: 2026-05-21_
## Summary
The backend integrates with four interchangeable AI providers for document classification: Anthropic Claude, OpenAI (and any OpenAI-compatible endpoint), Ollama, and LM Studio. There are no external databases, auth services, or cloud storage integrations — all persistence is local filesystem. The active provider is selected at runtime via settings persisted in `backend/data/settings.json`.
---
## AI Providers
All providers implement the `AIProvider` abstract interface defined in `backend/ai/base.py`. The active provider is resolved at request time in `backend/ai/__init__.py:get_provider()`.
### Anthropic
- **SDK:** `anthropic>=0.26``backend/ai/anthropic_provider.py`
- **Client:** `anthropic.AsyncAnthropic`
- **API:** Messages API (`client.messages.create`)
- **Default model:** `claude-sonnet-4-6`
- **Auth:** `api_key` stored in `backend/data/settings.json` under `providers.anthropic.api_key`; optionally seeded from env var `ANTHROPIC_API_KEY` (`.env.example`)
- **Calls made:** `classify` (max_tokens=1024), `suggest_topics` (max_tokens=256), `health_check` (max_tokens=5)
- **Text limit:** 8,000 characters per request (`MAX_AI_CHARS = 8_000`)
### OpenAI
- **SDK:** `openai>=1.30``backend/ai/openai_provider.py`
- **Client:** `openai.AsyncOpenAI`
- **API:** Chat Completions (`client.chat.completions.create`)
- **Default model:** `gpt-4o`
- **Auth:** `api_key` stored in `backend/data/settings.json` under `providers.openai.api_key`; optionally seeded from env var `OPENAI_API_KEY` (`.env.example`)
- **Custom base URL:** Supported via `providers.openai.base_url` in settings (allows pointing at any OpenAI-compatible endpoint)
### Ollama
- **Provider file:** `backend/ai/ollama_provider.py`
- **Implementation:** Subclass of `OpenAIProvider` — uses the OpenAI SDK with a custom `base_url`
- **Default base URL:** `http://host.docker.internal:11434/v1`
- **Default model:** `llama3.2`
- **Auth:** Stub key `"ollama"` (no real auth required)
- **Network path:** Reaches the host machine's Ollama daemon via Docker's `host.docker.internal` DNS alias (configured in `docker-compose.yml` via `extra_hosts`)
### LM Studio
- **Provider file:** `backend/ai/lmstudio_provider.py`
- **Implementation:** Subclass of `OpenAIProvider` — uses the OpenAI SDK with a custom `base_url`
- **Default base URL:** `http://host.docker.internal:1234/v1`
- **Default model:** `gemma-4-e4b-it`
- **Auth:** Stub key `"lm-studio"` (no real auth required)
- **Network path:** Reaches the host machine's LM Studio server via `host.docker.internal` (same `extra_hosts` setting)
- **Default active provider** — the app works out of the box with LM Studio and no API keys
---
## Provider Selection & Settings Persistence
- Active provider and all per-provider config (model names, API keys, base URLs) are persisted in `backend/data/settings.json`.
- Settings are loaded fresh on each classification request in `backend/services/classifier.py:classify_document()`.
- API keys returned from the settings API are masked (last 4 chars shown) via `backend/services/storage.py:mask_api_key()`.
- The Settings UI allows switching providers without restart.
---
## Frontend ↔ Backend Communication
- **Protocol:** HTTP REST over JSON (and multipart form for uploads)
- **Client:** Native browser `fetch` API — `frontend/src/api/client.js`
- **Base path:** All requests go to `/api/*` — no hardcoded backend hostname in the frontend
- **Proxy (dev):** Vite dev server proxies `/api``http://backend:8000``frontend/vite.config.js`
- **Proxy (prod):** Comment in `frontend/src/api/client.js` notes nginx is expected; no nginx config is present in the repo
### API Endpoints consumed by the frontend
| Method | Path | Purpose |
|---|---|---|
| POST | `/api/documents/upload` | Upload file with optional auto-classify flag |
| GET | `/api/documents` | List documents (paginated, optional topic filter) |
| GET | `/api/documents/:id` | Get single document metadata |
| DELETE | `/api/documents/:id` | Delete document |
| POST | `/api/documents/:id/classify` | (Re)classify document, optional topic list |
| GET | `/api/topics` | List all topics |
| POST | `/api/topics` | Create topic |
| PATCH | `/api/topics/:id` | Update topic |
| DELETE | `/api/topics/:id` | Delete topic |
| POST | `/api/topics/suggest` | AI topic suggestions for a document |
| GET | `/api/settings` | Get settings (keys masked) |
| PATCH | `/api/settings` | Update settings |
| POST | `/api/settings/test-provider` | Health-check the active or named provider |
| GET | `/api/settings/default-prompt` | Retrieve the default classification system prompt |
---
## Docker Services
Defined in `docker-compose.yml`:
| Service | Image | Port | Notes |
|---|---|---|---|
| `backend` | Built from `./backend/Dockerfile` | `8000:8000` | Mounts `./backend/data:/app/data` for persistence; `./backend:/app` for hot-reload |
| `frontend` | Built from `./frontend/Dockerfile` | `5173:5173` | Mounts `./frontend/src` and `index.html` for hot-reload; depends on `backend` |
Both services use `extra_hosts: host.docker.internal:host-gateway` on the backend to allow Ollama/LM Studio connections to the host machine.
---
## Environment Variables
| Variable | Required | Where used | Notes |
|---|---|---|---|
| `DATA_DIR` | No | `backend/config.py` | Root path for uploads/metadata/settings; defaults to `/app/data` |
| `ANTHROPIC_API_KEY` | No | `.env.example` | Bootstrap only — app manages keys via settings UI |
| `OPENAI_API_KEY` | No | `.env.example` | Bootstrap only — app manages keys via settings UI |
| `PYTHONDONTWRITEBYTECODE` | No | `docker-compose.yml` | Set to `1` to suppress `.pyc` files in Docker |
---
## Authentication & Identity
- No user authentication. The application has no login system, sessions, or identity provider.
- API keys for AI providers are stored in plain text in `backend/data/settings.json` (masked only when returned via the settings API).
---
## Monitoring & Observability
- No error tracking service (no Sentry, Datadog, etc.).
- No structured logging framework — FastAPI default stdout logging only.
- A `/health` endpoint exists at `backend/main.py` returning `{"status": "ok"}`.
- Provider connectivity tested on demand via `POST /api/settings/test-provider`.
---
## Webhooks & Callbacks
- None — the application makes no outbound webhook calls and exposes no webhook receiver endpoints.
---
## Gaps / Unknowns
- No nginx or reverse-proxy config present for production deployments; the client-side comment references it but no config exists.
- No container registry or CI/CD pipeline configuration detected.
- API keys are stored in a plain JSON file on disk with no encryption at rest.
- The `ANTHROPIC_API_KEY` / `OPENAI_API_KEY` env vars from `.env.example` are noted as bootstrap helpers but no code in the repo reads them directly — they appear to be manual seeding hints only.
+129
View File
@@ -0,0 +1,129 @@
# STACK — document-scanner
_Last updated: 2026-05-21_
## Summary
Document Scanner is a full-stack application with a Python/FastAPI backend and a Vue 3 frontend, containerised with Docker Compose. The backend handles document ingestion, text extraction, and AI-powered topic classification; the frontend is a single-page app served by Vite. No external database is used — all state is persisted to the local filesystem.
---
## Languages
| Language | Version | Where used |
|---|---|---|
| Python | 3.12 (pinned in `backend/Dockerfile`) | Backend API, AI providers, services |
| JavaScript (ES modules) | ES2022+ (`"type": "module"` in `frontend/package.json`) | Frontend SPA |
---
## Runtime
**Backend:**
- CPython 3.12 (Docker image: `python:3.12-slim`)
- ASGI server: Uvicorn `>=0.29` with standard extras (websockets, httptools)
- Entry point: `backend/main.py``uvicorn main:app`
**Frontend:**
- Node.js 20 (Docker image: `node:20-alpine`)
- Dev server: Vite 5 on port 5173
- Entry point: `frontend/index.html``frontend/src/main.js`
**Package Manager:**
- Backend: `pip` — lockfile: none (ranges only in `backend/requirements.txt`)
- Frontend: `npm` — lockfile: `frontend/package-lock.json` (present but not committed, generated on `npm install`)
---
## Frameworks
### Backend
| Package | Version | Purpose |
|---|---|---|
| `fastapi` | `>=0.111` | REST API framework — `backend/main.py` |
| `uvicorn[standard]` | `>=0.29` | ASGI server |
| `pydantic-settings` | `>=2.2` | Settings/config validation |
| `python-multipart` | latest | Multipart file upload parsing |
### Frontend
| Package | Version | Purpose |
|---|---|---|
| `vue` | `^3.4.0` | UI framework — `frontend/src/App.vue` and all components |
| `vue-router` | `^4.3.0` | Client-side routing — `frontend/src/router/index.js` |
| `pinia` | `^2.1.0` | State management — `frontend/src/stores/` |
### Build / Dev Tooling
| Tool | Version | Purpose |
|---|---|---|
| `vite` | `^5.2.0` | Frontend bundler and dev server — `frontend/vite.config.js` |
| `@vitejs/plugin-vue` | `^5.0.0` | Vue SFC support in Vite |
| `tailwindcss` | `^3.4.0` | Utility-first CSS — `frontend/tailwind.config.js` |
| `postcss` | `^8.4.0` | CSS processing — `frontend/postcss.config.js` |
| `autoprefixer` | `^10.4.0` | CSS vendor prefixing |
---
## Key Backend Dependencies
| Package | Version | Purpose |
|---|---|---|
| `anthropic` | `>=0.26` | Anthropic Claude API client — `backend/ai/anthropic_provider.py` |
| `openai` | `>=1.30` | OpenAI / OpenAI-compatible API client — `backend/ai/openai_provider.py`, also used for Ollama and LM Studio via `base_url` override |
| `PyMuPDF` (`fitz`) | `>=1.24` | PDF text extraction — `backend/services/extractor.py` |
| `python-docx` | `>=1.1` | DOCX text extraction — `backend/services/extractor.py` |
| `pytesseract` | `>=0.3` | OCR for image files — `backend/services/extractor.py` |
| `Pillow` | `>=10.3` | Image handling for OCR — `backend/services/extractor.py` |
| `filelock` | `>=3.14` | File-based concurrency locks — `backend/services/storage.py` |
| `aiofiles` | `>=23.2` | Async file I/O support |
| `httpx` | `>=0.27` | Async HTTP client (used internally by `anthropic` and `openai` SDKs) |
---
## Testing
| Tool | Version | Purpose |
|---|---|---|
| `pytest` | `>=8.2` | Test runner — `backend/pytest.ini`, `backend/tests/` |
| `pytest-asyncio` | `>=0.23` | Async test support; `asyncio_mode = auto` set in `backend/pytest.ini` |
No frontend test framework is present.
---
## Storage
- **File system only** — no database engine.
- Upload files stored at `backend/data/uploads/` (UUID-named).
- Document metadata stored as per-document JSON files at `backend/data/metadata/`.
- Topics registry: `backend/data/topics.json`.
- App settings: `backend/data/settings.json`.
- File-level concurrency managed via `filelock` (`backend/services/storage.py`).
---
## System Dependencies (backend Docker image)
Installed via `apt-get` in `backend/Dockerfile`:
- `tesseract-ocr` — OCR binary for `pytesseract`
- `libgl1`, `libglib2.0-0` — shared libraries required by PyMuPDF
---
## Configuration
- Environment variable `DATA_DIR` sets the root data path (default: `/app/data`).
- AI provider settings (models, API keys, base URLs) are stored in `backend/data/settings.json` and managed through the in-app Settings UI.
- Optional bootstrap via `.env` (see `.env.example`): only `ANTHROPIC_API_KEY` and `OPENAI_API_KEY` are referenced.
- Default active provider is `lmstudio` (no API key required).
---
## Gaps / Unknowns
- No Python version pinning file (`.python-version`, `pyproject.toml`) outside the Dockerfile — local dev outside Docker may use a different Python version.
- No frontend lockfile committed; exact transitive dependency versions are non-deterministic until `npm install` is run.
- No linter or formatter config detected (no `.eslintrc`, `.prettierrc`, `biome.json`, `ruff.toml`, `mypy.ini`, etc.).
- No production deployment config beyond Docker Compose (no nginx config, no cloud provider manifests).
+144
View File
@@ -0,0 +1,144 @@
# STRUCTURE — document-scanner
_Last updated: 2026-05-21_
## Summary
The project is a monorepo with two top-level service directories (`backend/`, `frontend/`) and Docker Compose at the root. Backend is a Python/FastAPI app; frontend is a Vue 3 SPA built with Vite. All persistent data lives under `backend/data/`.
---
## Top-Level Layout
```
document_scanner/
├── backend/ Python FastAPI service
├── frontend/ Vue 3 SPA
├── docker-compose.yml Two-service compose (backend + frontend)
├── .env.example Optional env vars (API keys)
└── .claude/ Claude Code settings
```
---
## Backend
```
backend/
├── main.py FastAPI app: CORS, lifespan, router registration
├── config.py Path constants, DEFAULT_SETTINGS, ensure_data_dirs()
├── requirements.txt Python dependencies
├── pytest.ini pytest config (asyncio_mode=auto)
├── Dockerfile
├── api/ FastAPI routers (thin HTTP layer)
│ ├── documents.py Upload, list, get, delete, reclassify endpoints
│ ├── topics.py Topic CRUD endpoints
│ └── settings.py AI provider settings endpoints
├── ai/ AI provider abstraction
│ ├── base.py AIProvider ABC + ClassificationResult dataclass
│ ├── __init__.py get_provider() factory
│ ├── anthropic_provider.py
│ ├── openai_provider.py
│ ├── ollama_provider.py extends OpenAIProvider
│ └── lmstudio_provider.py extends OpenAIProvider
├── services/ Business logic (no FastAPI dependency)
│ ├── extractor.py Text extraction: PDF/DOCX/image/text dispatch
│ ├── classifier.py Orchestrates AI call + topic auto-creation
│ └── storage.py Flat-file JSON CRUD + filelock
├── data/ Runtime data (volume-mounted in Docker)
│ ├── uploads/ Uploaded document files
│ ├── metadata/ Per-document JSON metadata files
│ ├── topics.json Global topic list
│ └── settings.json Active AI provider + system prompt config
└── tests/
├── conftest.py Fixtures: isolated tmp data dir, TestClient, sample files
├── test_health.py
├── test_documents.py
├── test_topics.py
├── test_settings.py
├── test_extractor.py
├── test_classifier.py
└── test_lmstudio.py
```
---
## Frontend
```
frontend/
├── index.html Vite entry HTML
├── vite.config.js Vite config (Vue plugin, /api proxy)
├── tailwind.config.js
├── postcss.config.js
├── package.json Vue 3, Vue Router 4, Pinia; no test framework
├── Dockerfile
└── src/
├── main.js App bootstrap: Vue + Pinia + Router
├── App.vue Root component (sidebar layout wrapper)
├── style.css Global Tailwind imports
├── api/
│ └── client.js fetch wrapper; all API calls go through here
├── stores/ Pinia stores (data + actions layer)
│ ├── documents.js Document list, upload, classify state
│ ├── topics.js Topic list CRUD state
│ └── settings.js AI provider settings state
├── router/
│ └── index.js Routes: /, /topics, /topics/:name, /document/:id, /settings
├── views/ Page-level components (one per route)
│ ├── HomeView.vue
│ ├── TopicsView.vue
│ ├── DocumentView.vue
│ └── SettingsView.vue
└── components/ Reusable UI components
├── layout/
│ └── AppSidebar.vue
├── documents/
│ └── DocumentCard.vue
├── topics/
│ ├── TopicBadge.vue
│ └── TopicManager.vue
└── upload/
├── DropZone.vue
└── UploadProgress.vue
```
---
## Key Entry Points
| File | Purpose |
|---|---|
| `backend/main.py` | FastAPI app instantiation, middleware, router registration |
| `backend/config.py` | All path constants and default settings — change storage paths here |
| `backend/ai/__init__.py` | Add a new AI provider here |
| `frontend/src/main.js` | Vue app bootstrap |
| `frontend/src/api/client.js` | All HTTP calls originate here |
---
## Where to Add New Code
- **New API endpoint**: add router in `backend/api/`, register in `backend/main.py`
- **New AI provider**: implement `AIProvider` ABC in `backend/ai/`, add case in `get_provider()`
- **New document type**: add extraction branch in `backend/services/extractor.py`
- **New frontend page**: add view in `src/views/`, add route in `src/router/index.js`
- **New shared UI component**: add to relevant `src/components/<category>/` subdirectory
---
## Gaps / Unknowns
- No `src/components/settings/` subdirectory — settings UI is entirely in `SettingsView.vue`
- No migration or schema versioning for `topics.json` / `settings.json` flat files
+87
View File
@@ -0,0 +1,87 @@
# TESTING — document-scanner
_Last updated: 2026-05-21_
## Summary
The backend has solid integration test coverage across all API surfaces and services using pytest + FastAPI TestClient. Each test runs in a fully isolated temporary data directory, so there is no shared state between tests. The frontend has no test framework configured at all.
---
## Backend Testing
### Framework
- **pytest** + **pytest-asyncio** (`asyncio_mode = auto` in `pytest.ini`)
- **FastAPI TestClient** (synchronous ASGI test client from `httpx`)
- No mocking library — AI calls are either tested with real parsing logic or the AI layer is swapped via provider mocking
### Test Isolation Strategy (conftest.py)
- `isolated_data_dir` fixture is `autouse=True` — every test automatically gets:
- A fresh `tmp_path/data/` directory with `uploads/`, `metadata/`
- Clean `topics.json` and `settings.json` initialized from `DEFAULT_SETTINGS`
- Monkeypatched `DATA_DIR` env var and all module-level path constants in `config` and `services.storage`
- New `FileLock` instances pointing to the tmp dir
- `client` fixture wraps FastAPI `TestClient` with the isolated data dir active
### Test Files
| File | What it covers |
|---|---|
| `test_health.py` | `GET /health` returns `{"status": "ok"}` |
| `test_documents.py` | Upload TXT/PDF (no-classify), list, get, delete; extracts text correctly |
| `test_topics.py` | Create, list, delete topics via API |
| `test_settings.py` | Read default settings, update provider config |
| `test_extractor.py` | Unit tests for `extract_text()` on TXT, PDF, DOCX, image paths |
| `test_classifier.py` | Unit tests for JSON parsing helpers (`_parse_classification`, `_parse_suggestions`, `_strip_code_fences`) — no real AI calls |
| `test_lmstudio.py` | LMStudio provider-specific behaviour (likely mocked or uses a local endpoint) |
### Fixtures Available
| Fixture | Provides |
|---|---|
| `isolated_data_dir` | Autouse — clean tmp data dir |
| `client` | FastAPI TestClient with isolated data |
| `sample_txt` | A `.txt` file with test content |
| `sample_pdf` | A minimal valid PDF created with PyMuPDF |
### What Is NOT Tested
- Auto-classification flow end-to-end (requires a live AI provider)
- Document reclassify endpoint
- Anthropic, OpenAI, Ollama provider implementations directly
- Any concurrent write / filelock contention scenarios
- File size / type validation edge cases
- Frontend — no tests exist
---
## Frontend Testing
- **No test framework installed** — `package.json` has no `vitest`, `jest`, or `@testing-library/vue`
- No test files found under `frontend/src/`
- No Cypress or Playwright configuration
---
## Running Tests
```bash
# From backend/
pytest
# With verbose output
pytest -v
# Single file
pytest tests/test_documents.py
```
---
## Gaps / Unknowns
- No test coverage measurement (no `pytest-cov` in `requirements.txt`)
- `test_lmstudio.py` content not inspected — unclear if it hits a real local endpoint
- No CI configuration (no GitHub Actions, no Dockerfile for test runner)
- No snapshot or contract tests for API response shapes
- Frontend is completely untested
+17
View File
@@ -0,0 +1,17 @@
FROM python:3.12-slim
WORKDIR /app
# System deps for PyMuPDF + OCR
RUN apt-get update && apt-get install -y \
tesseract-ocr \
libgl1 \
libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
+36
View File
@@ -0,0 +1,36 @@
from ai.base import AIProvider, ClassificationResult
from ai.anthropic_provider import AnthropicProvider
from ai.openai_provider import OpenAIProvider
from ai.ollama_provider import OllamaProvider
from ai.lmstudio_provider import LMStudioProvider
def get_provider(settings: dict) -> AIProvider:
active = settings.get("active_provider", "lmstudio")
providers = settings.get("providers", {})
cfg = providers.get(active, {})
match active:
case "anthropic":
return AnthropicProvider(
api_key=cfg.get("api_key", ""),
model=cfg.get("model", "claude-sonnet-4-6"),
)
case "openai":
return OpenAIProvider(
api_key=cfg.get("api_key", ""),
model=cfg.get("model", "gpt-4o"),
base_url=cfg.get("base_url") or None,
)
case "ollama":
return OllamaProvider(
base_url=cfg.get("base_url", "http://host.docker.internal:11434"),
model=cfg.get("model", "llama3.2"),
)
case "lmstudio":
return LMStudioProvider(
base_url=cfg.get("base_url", "http://host.docker.internal:1234"),
model=cfg.get("model", "gemma-4-e4b-it"),
)
case _:
raise ValueError(f"Unknown AI provider: {active}")
+103
View File
@@ -0,0 +1,103 @@
import json
import re
import anthropic
from ai.base import AIProvider, ClassificationResult
MAX_AI_CHARS = 8_000
class AnthropicProvider(AIProvider):
def __init__(self, api_key: str, model: str = "claude-sonnet-4-6"):
self._api_key = api_key
self._model = model
def _client(self):
return anthropic.AsyncAnthropic(api_key=self._api_key)
async def classify(
self,
document_text: str,
existing_topics: list[str],
system_prompt: str,
) -> ClassificationResult:
topics_str = ", ".join(existing_topics) if existing_topics else "(none yet)"
user_msg = (
f"Existing topics: [{topics_str}]\n\n"
f"Document text:\n{document_text[:MAX_AI_CHARS]}"
)
client = self._client()
response = await client.messages.create(
model=self._model,
max_tokens=1024,
system=system_prompt,
messages=[{"role": "user", "content": user_msg}],
)
raw = response.content[0].text
return _parse_classification(raw)
async def suggest_topics(
self,
document_text: str,
system_prompt: str,
) -> list[str]:
user_msg = (
"Suggest 3-5 topic names for this document. "
"Return ONLY valid JSON: {\"suggested_topics\": [\"topic1\", \"topic2\"]}\n\n"
f"Document text:\n{document_text[:MAX_AI_CHARS]}"
)
client = self._client()
response = await client.messages.create(
model=self._model,
max_tokens=256,
system=system_prompt,
messages=[{"role": "user", "content": user_msg}],
)
raw = response.content[0].text
return _parse_suggestions(raw)
async def health_check(self) -> bool:
try:
client = self._client()
await client.messages.create(
model=self._model,
max_tokens=5,
messages=[{"role": "user", "content": "ping"}],
)
return True
except Exception:
return False
def _strip_code_fences(text: str) -> str:
text = re.sub(r"```(?:json)?\s*", "", text)
text = re.sub(r"```", "", text)
return text.strip()
def _parse_classification(raw: str) -> ClassificationResult:
raw = _strip_code_fences(raw)
# Try to find JSON object
match = re.search(r"\{.*\}", raw, re.DOTALL)
if match:
try:
data = json.loads(match.group())
return ClassificationResult(
topics=data.get("assigned_topics", []),
suggested_new_topics=data.get("new_topic_suggestions", []),
reasoning=data.get("reasoning", ""),
)
except json.JSONDecodeError:
pass
return ClassificationResult()
def _parse_suggestions(raw: str) -> list[str]:
raw = _strip_code_fences(raw)
match = re.search(r"\{.*\}", raw, re.DOTALL)
if match:
try:
data = json.loads(match.group())
return data.get("suggested_topics", [])
except json.JSONDecodeError:
pass
return []
+32
View File
@@ -0,0 +1,32 @@
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
@dataclass
class ClassificationResult:
topics: list[str] = field(default_factory=list)
suggested_new_topics: list[str] = field(default_factory=list)
reasoning: str = ""
class AIProvider(ABC):
@abstractmethod
async def classify(
self,
document_text: str,
existing_topics: list[str],
system_prompt: str,
) -> ClassificationResult:
...
@abstractmethod
async def suggest_topics(
self,
document_text: str,
system_prompt: str,
) -> list[str]:
...
@abstractmethod
async def health_check(self) -> bool:
...
+10
View File
@@ -0,0 +1,10 @@
from ai.openai_provider import OpenAIProvider
class LMStudioProvider(OpenAIProvider):
def __init__(self, base_url: str = "http://host.docker.internal:1234", model: str = "gemma-4-e4b-it"):
super().__init__(
api_key="lm-studio",
model=model,
base_url=base_url.rstrip("/") + "/v1",
)
+10
View File
@@ -0,0 +1,10 @@
from ai.openai_provider import OpenAIProvider
class OllamaProvider(OpenAIProvider):
def __init__(self, base_url: str = "http://host.docker.internal:11434", model: str = "llama3.2"):
super().__init__(
api_key="ollama",
model=model,
base_url=base_url.rstrip("/") + "/v1",
)
+104
View File
@@ -0,0 +1,104 @@
import json
import re
from openai import AsyncOpenAI
from ai.base import AIProvider, ClassificationResult
MAX_AI_CHARS = 8_000
class OpenAIProvider(AIProvider):
def __init__(self, api_key: str, model: str = "gpt-4o", base_url: str | None = None):
self._api_key = api_key
self._model = model
self._base_url = base_url
def _client(self) -> AsyncOpenAI:
return AsyncOpenAI(api_key=self._api_key or "placeholder", base_url=self._base_url)
async def classify(
self,
document_text: str,
existing_topics: list[str],
system_prompt: str,
) -> ClassificationResult:
topics_str = ", ".join(existing_topics) if existing_topics else "(none yet)"
user_msg = (
f"Existing topics: [{topics_str}]\n\n"
f"Document text:\n{document_text[:MAX_AI_CHARS]}"
)
response = await self._client().chat.completions.create(
model=self._model,
max_tokens=1024,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_msg},
],
)
raw = response.choices[0].message.content or ""
return _parse_classification(raw)
async def suggest_topics(
self,
document_text: str,
system_prompt: str,
) -> list[str]:
user_msg = (
"Suggest 3-5 topic names for this document. "
"Return ONLY valid JSON: {\"suggested_topics\": [\"topic1\", \"topic2\"]}\n\n"
f"Document text:\n{document_text[:MAX_AI_CHARS]}"
)
response = await self._client().chat.completions.create(
model=self._model,
max_tokens=256,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_msg},
],
)
raw = response.choices[0].message.content or ""
return _parse_suggestions(raw)
async def health_check(self) -> bool:
try:
await self._client().chat.completions.create(
model=self._model,
max_tokens=5,
messages=[{"role": "user", "content": "ping"}],
)
return True
except Exception:
return False
def _strip_code_fences(text: str) -> str:
text = re.sub(r"```(?:json)?\s*", "", text)
text = re.sub(r"```", "", text)
return text.strip()
def _parse_classification(raw: str) -> ClassificationResult:
raw = _strip_code_fences(raw)
match = re.search(r"\{.*\}", raw, re.DOTALL)
if match:
try:
data = json.loads(match.group())
return ClassificationResult(
topics=data.get("assigned_topics", []),
suggested_new_topics=data.get("new_topic_suggestions", []),
reasoning=data.get("reasoning", ""),
)
except json.JSONDecodeError:
pass
return ClassificationResult()
def _parse_suggestions(raw: str) -> list[str]:
raw = _strip_code_fences(raw)
match = re.search(r"\{.*\}", raw, re.DOTALL)
if match:
try:
data = json.loads(match.group())
return data.get("suggested_topics", [])
except json.JSONDecodeError:
pass
return []
View File
+101
View File
@@ -0,0 +1,101 @@
from datetime import datetime, timezone
from fastapi import APIRouter, UploadFile, File, Form, HTTPException, Query
from services import storage, extractor, classifier
router = APIRouter(prefix="/api/documents", tags=["documents"])
ALLOWED_MIME_TYPES = {
"application/pdf",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"application/msword",
"text/plain",
"text/markdown",
"image/png",
"image/jpeg",
"image/jpg",
"image/tiff",
"image/webp",
}
@router.post("/upload")
async def upload_document(
file: UploadFile = File(...),
auto_classify: bool = Form(True),
):
content = await file.read()
if len(content) == 0:
raise HTTPException(400, "Empty file")
mime = file.content_type or "application/octet-stream"
saved = storage.save_upload(content, file.filename or "upload", mime)
text = extractor.extract_text(saved["path"], mime)
now = datetime.now(timezone.utc).isoformat()
meta = {
"id": saved["id"],
"original_name": file.filename or "upload",
"filename": saved["filename"],
"mime_type": mime,
"size_bytes": len(content),
"extracted_text": text,
"topics": [],
"created_at": now,
"classified_at": None,
}
storage.save_metadata(meta)
if auto_classify:
try:
topics = await classifier.classify_document(saved["id"])
meta["topics"] = topics
meta["classified_at"] = datetime.now(timezone.utc).isoformat()
except Exception as e:
# Classification failure is non-fatal; document is still saved
meta["classification_error"] = str(e)
return meta
@router.get("")
async def list_documents(
topic: str | None = Query(None),
page: int = Query(1, ge=1),
per_page: int = Query(20, ge=1, le=100),
):
docs = storage.list_metadata(topic=topic)
total = len(docs)
start = (page - 1) * per_page
return {"items": docs[start : start + per_page], "total": total, "page": page, "per_page": per_page}
@router.get("/{doc_id}")
async def get_document(doc_id: str):
meta = storage.get_metadata(doc_id)
if meta is None:
raise HTTPException(404, "Document not found")
return meta
@router.delete("/{doc_id}")
async def delete_document(doc_id: str):
ok = storage.delete_document(doc_id)
if not ok:
raise HTTPException(404, "Document not found")
return {"success": True}
@router.post("/{doc_id}/classify")
async def classify_document(doc_id: str, body: dict = {}):
meta = storage.get_metadata(doc_id)
if meta is None:
raise HTTPException(404, "Document not found")
topic_names = body.get("topics") if body else None
try:
topics = await classifier.classify_document(doc_id, topic_names)
except Exception as e:
raise HTTPException(500, f"Classification failed: {e}")
return {"topics": topics}
+84
View File
@@ -0,0 +1,84 @@
import time
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel
from services import storage
from config import DEFAULT_SYSTEM_PROMPT
from ai import get_provider
router = APIRouter(prefix="/api/settings", tags=["settings"])
class SettingsPatch(BaseModel):
system_prompt: str | None = None
active_provider: str | None = None
providers: dict | None = None
class TestProviderRequest(BaseModel):
provider: str
@router.get("")
async def get_settings():
settings = storage.load_settings()
return storage.settings_masked(settings)
@router.patch("")
async def patch_settings(body: SettingsPatch):
settings = storage.load_settings()
if body.system_prompt is not None:
settings["system_prompt"] = body.system_prompt
if body.active_provider is not None:
valid = {"anthropic", "openai", "ollama", "lmstudio"}
if body.active_provider not in valid:
raise HTTPException(400, f"Invalid provider. Must be one of: {valid}")
settings["active_provider"] = body.active_provider
if body.providers is not None:
# Deep merge per-provider config
for prov_name, prov_cfg in body.providers.items():
if prov_name not in settings.get("providers", {}):
settings.setdefault("providers", {})[prov_name] = {}
existing = settings["providers"][prov_name]
for key, val in prov_cfg.items():
# Don't overwrite api_key if it comes in masked (contains ****)
if key == "api_key" and val and "****" in str(val):
continue
existing[key] = val
storage.save_settings(settings)
return storage.settings_masked(settings)
@router.post("/test-provider")
async def test_provider(body: TestProviderRequest):
settings = storage.load_settings()
# Temporarily switch active provider for the test
test_settings = dict(settings)
test_settings["active_provider"] = body.provider
try:
provider = get_provider(test_settings)
except ValueError as e:
raise HTTPException(400, str(e))
start = time.monotonic()
try:
ok = await provider.health_check()
except Exception as e:
return {"ok": False, "message": str(e), "latency_ms": 0}
latency_ms = int((time.monotonic() - start) * 1000)
return {
"ok": ok,
"message": "Connection successful" if ok else "Health check failed",
"latency_ms": latency_ms,
}
@router.get("/default-prompt")
async def get_default_prompt():
return {"system_prompt": DEFAULT_SYSTEM_PROMPT}
+72
View File
@@ -0,0 +1,72 @@
from fastapi import APIRouter, HTTPException
from pydantic import BaseModel
from services import storage, classifier
router = APIRouter(prefix="/api/topics", tags=["topics"])
class TopicCreate(BaseModel):
name: str
description: str = ""
color: str = "#6366f1"
class TopicUpdate(BaseModel):
name: str | None = None
description: str | None = None
color: str | None = None
class SuggestRequest(BaseModel):
document_id: str
@router.get("")
async def list_topics():
topics = storage.load_topics()
counts = storage.topic_doc_counts()
for t in topics:
t["doc_count"] = counts.get(t["name"], 0)
return {"topics": topics}
@router.post("")
async def create_topic(body: TopicCreate):
topic = storage.create_topic(body.name, body.description, body.color)
topic["doc_count"] = 0
return topic
@router.patch("/{topic_id}")
async def update_topic(topic_id: str, body: TopicUpdate):
topic = storage.update_topic(
topic_id,
name=body.name,
description=body.description,
color=body.color,
)
if topic is None:
raise HTTPException(404, "Topic not found")
counts = storage.topic_doc_counts()
topic["doc_count"] = counts.get(topic["name"], 0)
return topic
@router.delete("/{topic_id}")
async def delete_topic(topic_id: str):
name = storage.delete_topic(topic_id)
if name is None:
raise HTTPException(404, "Topic not found")
return {"success": True, "removed_from_documents": True}
@router.post("/suggest")
async def suggest_topics(body: SuggestRequest):
meta = storage.get_metadata(body.document_id)
if meta is None:
raise HTTPException(404, "Document not found")
try:
suggestions = await classifier.suggest_topics_for_document(body.document_id)
except Exception as e:
raise HTTPException(500, f"Suggestion failed: {e}")
return {"suggested": suggestions}
+51
View File
@@ -0,0 +1,51 @@
import json
import os
from pathlib import Path
DATA_DIR = Path(os.environ.get("DATA_DIR", "/app/data"))
UPLOADS_DIR = DATA_DIR / "uploads"
METADATA_DIR = DATA_DIR / "metadata"
TOPICS_FILE = DATA_DIR / "topics.json"
SETTINGS_FILE = DATA_DIR / "settings.json"
DEFAULT_SYSTEM_PROMPT = """You are a document classification assistant. When given a document's text content and a list of existing topics, you must:
1. Assign the document to one or more relevant topics from the list.
2. If no existing topics fit well, suggest new topic names.
Return ONLY valid JSON in this exact format, with no additional text or explanation:
{"assigned_topics": ["topic1"], "new_topic_suggestions": ["new topic name"]}
If the document fits no topics and you have no suggestions, return: {"assigned_topics": [], "new_topic_suggestions": []}"""
DEFAULT_SETTINGS = {
"system_prompt": DEFAULT_SYSTEM_PROMPT,
"active_provider": "lmstudio",
"providers": {
"anthropic": {
"api_key": "",
"model": "claude-sonnet-4-6"
},
"openai": {
"api_key": "",
"model": "gpt-4o",
"base_url": None
},
"ollama": {
"base_url": "http://host.docker.internal:11434",
"model": "llama3.2"
},
"lmstudio": {
"base_url": "http://host.docker.internal:1234",
"model": "gemma-4-e4b-it"
}
}
}
def ensure_data_dirs():
UPLOADS_DIR.mkdir(parents=True, exist_ok=True)
METADATA_DIR.mkdir(parents=True, exist_ok=True)
if not TOPICS_FILE.exists():
TOPICS_FILE.write_text(json.dumps({"topics": []}, indent=2))
if not SETTINGS_FILE.exists():
SETTINGS_FILE.write_text(json.dumps(DEFAULT_SETTINGS, indent=2))
@@ -0,0 +1,14 @@
{
"id": "69eb8545-2e19-4651-903e-6489dbd9f687",
"original_name": "1907-Rechnung.pdf",
"filename": "69eb8545-2e19-4651-903e-6489dbd9f687.pdf",
"mime_type": "application/pdf",
"size_bytes": 38090,
"extracted_text": "mobilcom-debitel GmbH · Geschäftsführung: Ingo Arnold, Antonius Fromme, Rickmann von Platen \nHRB 14826 KI, Amtsgericht Kiel · Vorsitzender des Aufsichtsrats: Stephan Esch · Sitz der Gesellschaft: Büdelsdorf\nBankverbindung: Commerzbank Rendsburg · IBAN DE08214400450844443200 · BIC COBADEFFXXX\nUSt-ID: DE 194 910 634 · Gläubiger-ID: DE43ZZZ00000074855\nHaben Sie Fragen zur Rechnung?\nwww.md.de/faq\nmobilcom-debitel Kundenservice\nHandykurzwahl: 22240\nDer Anruf erfolgt zu einer ortsgebundenen Rufnummer\nTelefon: 040/55 55 41 00 0\nmobilcom-debitel Kundenservice Technik\nTelefon: 0900/10 22 24 0\n€ 2,49/Anruf, nur aus dem dt. Festnetz erreichbar\nwww.md.de\nHerrn\nDominik Ritter\nLeibnizstr. 41\n10629 Berlin\nRechnungsdatum:\nRechnungsnr.:\nKundennummer:\n31.07.2019\nM19046649250\n33040574\nPost: mobilcom-debitel GmbH · 99076 Erfurt\nIhre mobilcom-debitel Rechnung\nRechnungsbetrag netto\n55,4645 €\nUSt.-Betrag (19%)\n10,54 €\nRechnungsbetrag gesamt\n66,00 €\nDie Begleichung der Rechnung erfolgt am 07.08.2019 im Lastschriftverfahren mit der Mandatsreferenz-Nummer\nMC-33040574-00000001 von dem Konto: IBAN DE38100208900615356026.\nKennen Sie schon waipu.tv? Das ist Fernsehen wie noch nie: auf Smartphone, Tablet oder Ihrem TV.\nJetzt kostenlos testen: md.de/tv/waipu-tv.\nMobilfunk-Vertragsabrechnungen\nMobilfunk-Rufnummer: 0170 / 4322717\nVertragsnummer:\n217582256\nTeilnehmer: Dominik Ritter\nTarif:\nreal Allnet mit Smartphone 10\nMobilfunknetz: Telekom Mobilfunk\nDie Leistungen im Überblick\nMenge Details\nZeitraum/Datum\nSumme\nBasisleistungen\n1 Grundgebühr\n01.08.2019 - 31.08.2019\n31,0840 €\n1 freenet Hotspot Flat (DLS24M0TB0G0000):\nUnbegrenztes Datenvolumen im größten WLAN-Netzwerk\n01.08.2019 - 31.08.2019\n0,0000 €\n1 T@ke-away Flat Upgrade (+2 GB) - 6M (anteilig)\n03.07.2019 - 31.07.2019\n11,7839 €\n1 T@ke-away Flat Upgrade (+2 GB) - 6M\n01.08.2019 - 31.08.2019\n12,5966 €\n1 Kaspersky Passwort Manager 1 Monat (DLS1M1TB1G0299)\n(anteilig):\nEin Passwort für mehrere Konten!\n03.07.2019 - 31.07.2019\n2,3505 €\n1 Kaspersky Passwort Manager 1 Monat (DLS1M1TB1G0299)\n(anteilig)\n01.08.2019 - 02.08.2019\n0,1621 €\n1 Gutschrift Kaspersky Passwort Manager\n(DLS1M1TB1G0299) (anteilig)\n03.07.2019 - 31.07.2019\n-2,3505 €\n1 Gutschrift Kaspersky Passwort Manager\n(DLS1M1TB1G0299) (anteilig)\n01.08.2019 - 02.08.2019\n-0,1621 €\n1 Smartphone-Option\n01.08.2019 - 31.08.2019\n8,4034 €\nVerbindungen\n3 Verbindungen ins dt. Festnetz (FN)\n01.07.2019 - 03.07.2019\n0,0000 €\n39 Netzexterne Verbindungen (NX)\n28.06.2019 - 30.07.2019\n0,0000 €\n1 Abgehende Roaming Verbindungen (RA)\n17.07.2019 - 17.07.2019\n0,0000 €\n202 Datenverbindungen (DATA)\n27.06.2019 - 30.07.2019\n0,0000 €\n120 Roaming Datenverbindungen (RD)\n14.07.2019 - 20.07.2019\n0,0000 €\nZwischensumme netto\n63,8679 €\nIhre mobilcom-debitel Vorteile\n1 24 x 10 Euro Grundgebührrabatt\n01.08.2019 - 31.08.2019\n-8,4034 €\nNettobetrag für Rufnummer 0170 / 4322717\n55,4645 €\nSofern Sie die Löschung Ihrer Verbindungsdaten sofort, 90 oder 180 Tage nach Rechnungsstellung gewünscht haben, entfällt\nmit der Löschung unsere Nachweispflicht für diese Daten. Erfolgt innerhalb von 8 Wochen nach Erhalt der Rechnung kein\nschriftlicher Widerspruch, gilt die Rechnung als genehmigt. Begründete Einwendungen können auch gegen einzelne in der\nRechnung dargestellte Forderungen erhoben werden. Verzug tritt spätestens 30 Tage nach Zugang der Rechnung ein. Dies\nschließt einen frühzeitigeren Verzug nicht aus. Hinweise zum Ablauf eines Anbieterwechsels finden Sie auf der Internetseite\nder Bundesnetzagentur.\nRechnungserklärung\nSeite 1 von 2\n\nmobilcom-debitel GmbH · Geschäftsführung: Ingo Arnold, Antonius Fromme, Rickmann von Platen \nHRB 14826 KI, Amtsgericht Kiel · Vorsitzender des Aufsichtsrats: Stephan Esch · Sitz der Gesellschaft: Büdelsdorf\nBankverbindung: Commerzbank Rendsburg · IBAN DE08214400450844443200 · BIC COBADEFFXXX\nUSt-ID: DE 194 910 634 · Gläubiger-ID: DE43ZZZ00000074855\nRechnungsdatum:\nRechnungsnr.:\nKundennummer:\n31.07.2019\nM19046649250\n33040574\nIhre mobilcom-debitel Rechnung\nInformationen gemäß Telekommunikations-Transparenzverordnung\nMobilfunk-Rufnummer: 0170 / 4322717\nZeitraum Datenverbrauch:\n01.06.2019 - 30.06.2019\nVertragsbeginn:\n20.12.2016 Kündigungsfrist:\n3 Monat(e) Summe vereinbartes Datenvolumen:\n8000 MB\nMindestlaufzeit bis:\n19.12.2020 Kündigungseingang bis:\n19.09.2020 Verbrauchtes Datenvolumen:\n8080 MB\nSeite 2 von 2",
"topics": [
"Telecommunications",
"Billing and Invoicing"
],
"created_at": "2026-04-16T11:08:33.558670+00:00",
"classified_at": "2026-04-16T11:08:40.831347+00:00"
}
File diff suppressed because one or more lines are too long
@@ -0,0 +1,13 @@
{
"id": "cf4dd4cf-dcfb-42f1-957d-bcdba640163b",
"original_name": "invoice.txt",
"filename": "cf4dd4cf-dcfb-42f1-957d-bcdba640163b.txt",
"mime_type": "text/plain",
"size_bytes": 108,
"extracted_text": "This is an invoice for professional consulting services rendered in April 2026. Total amount due: 5000 EUR.",
"topics": [
"Invoice"
],
"created_at": "2026-04-16T11:06:08.026326+00:00",
"classified_at": "2026-04-16T11:06:09.636422+00:00"
}
@@ -0,0 +1,11 @@
{
"id": "e71d8a85-09a1-4cd8-b602-65aa9216a724",
"original_name": "test_doc.txt",
"filename": "e71d8a85-09a1-4cd8-b602-65aa9216a724.txt",
"mime_type": "text/plain",
"size_bytes": 57,
"extracted_text": "This document is about accounting and financial reports.",
"topics": [],
"created_at": "2026-04-16T11:05:24.317425+00:00",
"classified_at": null
}
+23
View File
@@ -0,0 +1,23 @@
{
"system_prompt": "You are a document classification assistant. When given a document's text content and a list of existing topics, you must:\n1. Assign the document to one or more relevant topics from the list.\n2. If no existing topics fit well, suggest new topic names.\nReturn ONLY valid JSON in this exact format, with no additional text or explanation:\n{\"assigned_topics\": [\"topic1\"], \"new_topic_suggestions\": [\"new topic name\"]}\nIf the document fits no topics and you have no suggestions, return: {\"assigned_topics\": [], \"new_topic_suggestions\": []}",
"active_provider": "lmstudio",
"providers": {
"anthropic": {
"api_key": "",
"model": "claude-sonnet-4-6"
},
"openai": {
"api_key": "",
"model": "gpt-4o",
"base_url": null
},
"ollama": {
"base_url": "http://host.docker.internal:11434",
"model": "llama3.2"
},
"lmstudio": {
"base_url": "http://host.docker.internal:1234",
"model": "gemma-4-e4b-it"
}
}
}
+22
View File
@@ -0,0 +1,22 @@
{
"topics": [
{
"id": "39ffdadb",
"name": "Test Topic",
"description": "",
"color": "#6366f1"
},
{
"id": "d2e0fbd8",
"name": "Telecommunications",
"description": "",
"color": "#6366f1"
},
{
"id": "d3823fd0",
"name": "Billing and Invoicing",
"description": "",
"color": "#6366f1"
}
]
}
File diff suppressed because one or more lines are too long
@@ -0,0 +1 @@
This is an invoice for professional consulting services rendered in April 2026. Total amount due: 5000 EUR.
@@ -0,0 +1 @@
This document is about accounting and financial reports.
+33
View File
@@ -0,0 +1,33 @@
from contextlib import asynccontextmanager
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from config import ensure_data_dirs
from api.documents import router as documents_router
from api.topics import router as topics_router
from api.settings import router as settings_router
@asynccontextmanager
async def lifespan(app: FastAPI):
ensure_data_dirs()
yield
app = FastAPI(title="Document Scanner API", version="1.0.0", lifespan=lifespan)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_methods=["*"],
allow_headers=["*"],
)
@app.get("/health")
async def health():
return {"status": "ok"}
app.include_router(documents_router)
app.include_router(topics_router)
app.include_router(settings_router)
+3
View File
@@ -0,0 +1,3 @@
[pytest]
asyncio_mode = auto
testpaths = tests
+15
View File
@@ -0,0 +1,15 @@
fastapi>=0.111
uvicorn[standard]>=0.29
python-multipart
pydantic-settings>=2.2
anthropic>=0.26
openai>=1.30
PyMuPDF>=1.24
python-docx>=1.1
pytesseract>=0.3
Pillow>=10.3
filelock>=3.14
aiofiles>=23.2
httpx>=0.27
pytest>=8.2
pytest-asyncio>=0.23
View File
+59
View File
@@ -0,0 +1,59 @@
"""
Classification orchestrator.
Loads settings, selects AI provider, classifies document, auto-creates suggested topics.
"""
from services import storage
from ai import get_provider
MAX_AI_CHARS = 8_000
async def classify_document(doc_id: str, topic_names: list[str] | None = None) -> list[str]:
"""
Classify a document by its ID. Returns the list of assigned topic names.
If topic_names is provided, restrict classification to those topics.
Auto-creates any newly suggested topics.
"""
meta = storage.get_metadata(doc_id)
if meta is None:
raise ValueError(f"Document {doc_id} not found")
settings = storage.load_settings()
system_prompt = settings.get("system_prompt", "")
provider = get_provider(settings)
# Use all known topics if not specified
if topic_names is None:
all_topics = storage.load_topics()
topic_names = [t["name"] for t in all_topics]
text = meta.get("extracted_text", "")
result = await provider.classify(text[:MAX_AI_CHARS], topic_names, system_prompt)
# Collect all topic names to persist (assigned + suggested)
all_new_names = set(result.suggested_new_topics) | set(result.topics)
# Auto-create any topic not already in the registry
existing_names = {t.lower() for t in topic_names}
for name in all_new_names:
if name.strip() and name.lower() not in existing_names:
storage.create_topic(name.strip())
# Final list: everything the AI assigned or suggested
final_topics = [t for t in list(set(result.topics + result.suggested_new_topics)) if t.strip()]
storage.update_document_topics(doc_id, final_topics)
return final_topics
async def suggest_topics_for_document(doc_id: str) -> list[str]:
"""Return AI-suggested topic names without modifying the document."""
meta = storage.get_metadata(doc_id)
if meta is None:
raise ValueError(f"Document {doc_id} not found")
settings = storage.load_settings()
system_prompt = settings.get("system_prompt", "")
provider = get_provider(settings)
text = meta.get("extracted_text", "")
return await provider.suggest_topics(text[:MAX_AI_CHARS], system_prompt)
+71
View File
@@ -0,0 +1,71 @@
"""
Text extraction dispatcher.
Supports: PDF (PyMuPDF), DOCX (python-docx), plain text, images (pytesseract).
"""
from pathlib import Path
MAX_STORED_CHARS = 50_000
def extract_text(file_path: str, mime_type: str) -> str:
path = Path(file_path)
try:
if mime_type == "application/pdf" or path.suffix.lower() == ".pdf":
return _extract_pdf(path)
elif mime_type in (
"application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"application/msword",
) or path.suffix.lower() in (".docx", ".doc"):
return _extract_docx(path)
elif mime_type and mime_type.startswith("image/"):
return _extract_image(path)
else:
return _extract_text_file(path)
except Exception as e:
return f"[Extraction error: {e}]"
def _extract_pdf(path: Path) -> str:
import fitz # PyMuPDF
doc = fitz.open(str(path))
pages = []
for page in doc:
pages.append(page.get_text())
doc.close()
return _truncate("\n".join(pages))
def _extract_docx(path: Path) -> str:
from docx import Document
doc = Document(str(path))
paragraphs = [p.text for p in doc.paragraphs if p.text.strip()]
return _truncate("\n".join(paragraphs))
def _extract_image(path: Path) -> str:
try:
from PIL import Image
import pytesseract
img = Image.open(str(path))
text = pytesseract.image_to_string(img)
return _truncate(text)
except ImportError:
return "[OCR unavailable: pytesseract or Pillow not installed]"
except Exception as e:
return f"[OCR error: {e}]"
def _extract_text_file(path: Path) -> str:
for enc in ("utf-8", "latin-1", "cp1252"):
try:
return _truncate(path.read_text(encoding=enc))
except UnicodeDecodeError:
continue
return "[Could not decode text file]"
def _truncate(text: str) -> str:
text = text.strip()
if len(text) > MAX_STORED_CHARS:
text = text[:MAX_STORED_CHARS] + "\n[... truncated ...]"
return text
+187
View File
@@ -0,0 +1,187 @@
import json
import uuid
import shutil
from datetime import datetime, timezone
from pathlib import Path
from filelock import FileLock
from config import UPLOADS_DIR, METADATA_DIR, TOPICS_FILE, SETTINGS_FILE, DEFAULT_SETTINGS
# ── File locks ────────────────────────────────────────────────────────────────
_topics_lock = FileLock(str(TOPICS_FILE) + ".lock")
_settings_lock = FileLock(str(SETTINGS_FILE) + ".lock")
# ── Documents ─────────────────────────────────────────────────────────────────
def save_upload(file_bytes: bytes, original_name: str, mime_type: str) -> dict:
doc_id = str(uuid.uuid4())
suffix = Path(original_name).suffix.lower()
filename = f"{doc_id}{suffix}"
dest = UPLOADS_DIR / filename
dest.write_bytes(file_bytes)
return {"id": doc_id, "filename": filename, "path": str(dest)}
def save_metadata(meta: dict) -> None:
path = METADATA_DIR / f"{meta['id']}.json"
lock = FileLock(str(path) + ".lock")
with lock:
path.write_text(json.dumps(meta, indent=2, ensure_ascii=False))
def get_metadata(doc_id: str) -> dict | None:
path = METADATA_DIR / f"{doc_id}.json"
if not path.exists():
return None
return json.loads(path.read_text())
def list_metadata(topic: str | None = None) -> list[dict]:
docs = []
for p in sorted(METADATA_DIR.glob("*.json"), key=lambda x: x.stat().st_mtime, reverse=True):
try:
meta = json.loads(p.read_text())
except Exception:
continue
if topic and topic not in meta.get("topics", []):
continue
docs.append(meta)
return docs
def delete_document(doc_id: str) -> bool:
meta_path = METADATA_DIR / f"{doc_id}.json"
if not meta_path.exists():
return False
meta = json.loads(meta_path.read_text())
upload_path = UPLOADS_DIR / meta.get("filename", "")
if upload_path.exists():
upload_path.unlink()
meta_path.unlink()
lock_path = Path(str(meta_path) + ".lock")
if lock_path.exists():
lock_path.unlink()
return True
def update_document_topics(doc_id: str, topics: list[str]) -> dict | None:
meta = get_metadata(doc_id)
if meta is None:
return None
meta["topics"] = topics
meta["classified_at"] = datetime.now(timezone.utc).isoformat()
save_metadata(meta)
return meta
def remove_topic_from_all_documents(topic_name: str) -> int:
"""Remove a topic name from all documents. Returns number of docs updated."""
count = 0
for p in METADATA_DIR.glob("*.json"):
try:
meta = json.loads(p.read_text())
except Exception:
continue
if topic_name in meta.get("topics", []):
meta["topics"] = [t for t in meta["topics"] if t != topic_name]
lock = FileLock(str(p) + ".lock")
with lock:
p.write_text(json.dumps(meta, indent=2, ensure_ascii=False))
count += 1
return count
# ── Topics ────────────────────────────────────────────────────────────────────
def load_topics() -> list[dict]:
with _topics_lock:
data = json.loads(TOPICS_FILE.read_text())
return data.get("topics", [])
def save_topics(topics: list[dict]) -> None:
with _topics_lock:
TOPICS_FILE.write_text(json.dumps({"topics": topics}, indent=2))
def get_topic(topic_id: str) -> dict | None:
return next((t for t in load_topics() if t["id"] == topic_id), None)
def create_topic(name: str, description: str = "", color: str = "#6366f1") -> dict:
topics = load_topics()
# Deduplicate by name (case-insensitive)
if any(t["name"].lower() == name.lower() for t in topics):
return next(t for t in topics if t["name"].lower() == name.lower())
topic = {
"id": str(uuid.uuid4())[:8],
"name": name,
"description": description,
"color": color,
}
topics.append(topic)
save_topics(topics)
return topic
def update_topic(topic_id: str, **kwargs) -> dict | None:
topics = load_topics()
for t in topics:
if t["id"] == topic_id:
t.update({k: v for k, v in kwargs.items() if v is not None})
save_topics(topics)
return t
return None
def delete_topic(topic_id: str) -> str | None:
topics = load_topics()
topic = next((t for t in topics if t["id"] == topic_id), None)
if not topic:
return None
name = topic["name"]
save_topics([t for t in topics if t["id"] != topic_id])
remove_topic_from_all_documents(name)
return name
def topic_doc_counts() -> dict[str, int]:
counts: dict[str, int] = {}
for p in METADATA_DIR.glob("*.json"):
try:
meta = json.loads(p.read_text())
except Exception:
continue
for t in meta.get("topics", []):
counts[t] = counts.get(t, 0) + 1
return counts
# ── Settings ──────────────────────────────────────────────────────────────────
def load_settings() -> dict:
with _settings_lock:
return json.loads(SETTINGS_FILE.read_text())
def save_settings(settings: dict) -> None:
with _settings_lock:
SETTINGS_FILE.write_text(json.dumps(settings, indent=2))
def mask_api_key(key: str) -> str:
if not key or len(key) <= 4:
return "****"
return "****" + key[-4:]
def settings_masked(settings: dict) -> dict:
import copy
s = copy.deepcopy(settings)
for prov in ("anthropic", "openai"):
key = s.get("providers", {}).get(prov, {}).get("api_key", "")
if key:
s["providers"][prov]["api_key"] = mask_api_key(key)
return s
View File
+70
View File
@@ -0,0 +1,70 @@
"""
pytest configuration: isolate each test with a temporary data directory.
"""
import os
import json
import pytest
import tempfile
import shutil
from pathlib import Path
from fastapi.testclient import TestClient
@pytest.fixture(autouse=True)
def isolated_data_dir(monkeypatch, tmp_path):
"""Each test gets its own clean data directory."""
data_dir = tmp_path / "data"
(data_dir / "uploads").mkdir(parents=True)
(data_dir / "metadata").mkdir(parents=True)
(data_dir / "topics.json").write_text(json.dumps({"topics": []}))
from config import DEFAULT_SETTINGS
(data_dir / "settings.json").write_text(json.dumps(DEFAULT_SETTINGS))
monkeypatch.setenv("DATA_DIR", str(data_dir))
# Patch the module-level path constants so the running app sees the temp dir
import config
monkeypatch.setattr(config, "DATA_DIR", data_dir)
monkeypatch.setattr(config, "UPLOADS_DIR", data_dir / "uploads")
monkeypatch.setattr(config, "METADATA_DIR", data_dir / "metadata")
monkeypatch.setattr(config, "TOPICS_FILE", data_dir / "topics.json")
monkeypatch.setattr(config, "SETTINGS_FILE", data_dir / "settings.json")
import services.storage as st
from filelock import FileLock
monkeypatch.setattr(st, "UPLOADS_DIR", data_dir / "uploads")
monkeypatch.setattr(st, "METADATA_DIR", data_dir / "metadata")
monkeypatch.setattr(st, "TOPICS_FILE", data_dir / "topics.json")
monkeypatch.setattr(st, "SETTINGS_FILE", data_dir / "settings.json")
monkeypatch.setattr(st, "_topics_lock", FileLock(str(data_dir / "topics.json") + ".lock"))
monkeypatch.setattr(st, "_settings_lock", FileLock(str(data_dir / "settings.json") + ".lock"))
yield data_dir
@pytest.fixture
def client(isolated_data_dir):
from main import app
with TestClient(app) as c:
yield c
@pytest.fixture
def sample_txt(tmp_path):
p = tmp_path / "sample.txt"
p.write_text("This is a test document about invoices and finance.")
return p
@pytest.fixture
def sample_pdf(tmp_path):
"""Create a minimal valid PDF for testing."""
import fitz
doc = fitz.open()
page = doc.new_page()
page.insert_text((50, 50), "Test PDF document about contracts and legal matters.")
pdf_path = tmp_path / "sample.pdf"
doc.save(str(pdf_path))
doc.close()
return pdf_path
+110
View File
@@ -0,0 +1,110 @@
"""
Unit tests for AI provider JSON parsing robustness and classifier orchestration.
Uses a mock provider — no real AI calls made.
"""
import json
import pytest
from ai.openai_provider import _parse_classification, _parse_suggestions, _strip_code_fences
from ai.base import ClassificationResult
def test_parse_clean_json():
raw = '{"assigned_topics": ["finance", "invoices"], "new_topic_suggestions": []}'
result = _parse_classification(raw)
assert result.topics == ["finance", "invoices"]
assert result.suggested_new_topics == []
def test_parse_with_code_fence():
raw = '```json\n{"assigned_topics": ["legal"], "new_topic_suggestions": ["contracts"]}\n```'
result = _parse_classification(raw)
assert result.topics == ["legal"]
assert result.suggested_new_topics == ["contracts"]
def test_parse_with_preamble():
raw = 'Here is the classification:\n{"assigned_topics": ["hr"], "new_topic_suggestions": []}\nDone.'
result = _parse_classification(raw)
assert result.topics == ["hr"]
def test_parse_malformed_returns_empty():
raw = "I cannot classify this document."
result = _parse_classification(raw)
assert result.topics == []
assert result.suggested_new_topics == []
def test_strip_code_fences():
raw = "```json\n{}\n```"
assert _strip_code_fences(raw) == "{}"
def test_parse_suggestions_clean():
raw = '{"suggested_topics": ["Human Resources", "Onboarding"]}'
result = _parse_suggestions(raw)
assert "Human Resources" in result
assert "Onboarding" in result
def test_parse_suggestions_with_fence():
raw = "```\n{\"suggested_topics\": [\"Finance\"]}\n```"
result = _parse_suggestions(raw)
assert result == ["Finance"]
def test_parse_suggestions_malformed():
raw = "No suggestions available."
result = _parse_suggestions(raw)
assert result == []
@pytest.mark.asyncio
async def test_classifier_with_mock_provider(isolated_data_dir):
"""Test classifier orchestration with a mock provider."""
from unittest.mock import AsyncMock, patch
from ai.base import ClassificationResult
import services.storage as st
# Create a document
doc_id = "test-doc-1"
st.save_metadata({
"id": doc_id,
"original_name": "test.txt",
"filename": "test-doc-1.txt",
"mime_type": "text/plain",
"size_bytes": 50,
"extracted_text": "Invoice for services rendered in March 2026.",
"topics": [],
"created_at": "2026-01-01T00:00:00Z",
"classified_at": None,
})
# Create some topics
st.create_topic("Finance")
st.create_topic("Legal")
mock_result = ClassificationResult(
topics=["Finance"],
suggested_new_topics=["Invoices"],
reasoning="Document is about financial invoicing.",
)
with patch("services.classifier.get_provider") as mock_get_provider:
mock_provider = AsyncMock()
mock_provider.classify = AsyncMock(return_value=mock_result)
mock_get_provider.return_value = mock_provider
from services.classifier import classify_document
topics = await classify_document(doc_id)
assert "Finance" in topics
assert "Invoices" in topics
# Verify new topic was auto-created
all_topics = st.load_topics()
assert any(t["name"] == "Invoices" for t in all_topics)
# Verify document was updated
meta = st.get_metadata(doc_id)
assert "Finance" in meta["topics"]
+107
View File
@@ -0,0 +1,107 @@
def test_upload_txt_no_classify(client, sample_txt):
with open(sample_txt, "rb") as f:
resp = client.post(
"/api/documents/upload",
files={"file": ("sample.txt", f, "text/plain")},
data={"auto_classify": "false"},
)
assert resp.status_code == 200
data = resp.json()
assert data["original_name"] == "sample.txt"
assert "extracted_text" in data
assert "invoices" in data["extracted_text"].lower() or len(data["extracted_text"]) > 0
assert data["topics"] == []
assert "id" in data
def test_upload_pdf_no_classify(client, sample_pdf):
with open(sample_pdf, "rb") as f:
resp = client.post(
"/api/documents/upload",
files={"file": ("sample.pdf", f, "application/pdf")},
data={"auto_classify": "false"},
)
assert resp.status_code == 200
data = resp.json()
assert data["mime_type"] == "application/pdf"
assert len(data["extracted_text"]) > 0
def test_list_documents(client, sample_txt):
with open(sample_txt, "rb") as f:
client.post(
"/api/documents/upload",
files={"file": ("a.txt", f, "text/plain")},
data={"auto_classify": "false"},
)
resp = client.get("/api/documents")
assert resp.status_code == 200
data = resp.json()
assert data["total"] == 1
assert len(data["items"]) == 1
def test_list_documents_filter_by_topic(client, sample_txt):
with open(sample_txt, "rb") as f:
upload = client.post(
"/api/documents/upload",
files={"file": ("a.txt", f, "text/plain")},
data={"auto_classify": "false"},
).json()
import services.storage as st
st.update_document_topics(upload["id"], ["finance"])
resp = client.get("/api/documents?topic=finance")
assert resp.json()["total"] == 1
resp2 = client.get("/api/documents?topic=legal")
assert resp2.json()["total"] == 0
def test_get_document(client, sample_txt):
with open(sample_txt, "rb") as f:
upload = client.post(
"/api/documents/upload",
files={"file": ("a.txt", f, "text/plain")},
data={"auto_classify": "false"},
).json()
resp = client.get(f"/api/documents/{upload['id']}")
assert resp.status_code == 200
assert resp.json()["id"] == upload["id"]
def test_get_document_not_found(client):
resp = client.get("/api/documents/nonexistent")
assert resp.status_code == 404
def test_delete_document(client, sample_txt):
with open(sample_txt, "rb") as f:
upload = client.post(
"/api/documents/upload",
files={"file": ("a.txt", f, "text/plain")},
data={"auto_classify": "false"},
).json()
resp = client.delete(f"/api/documents/{upload['id']}")
assert resp.status_code == 200
assert resp.json()["success"] is True
resp2 = client.get(f"/api/documents/{upload['id']}")
assert resp2.status_code == 404
def test_delete_document_not_found(client):
resp = client.delete("/api/documents/nonexistent")
assert resp.status_code == 404
def test_upload_empty_file(client):
resp = client.post(
"/api/documents/upload",
files={"file": ("empty.txt", b"", "text/plain")},
data={"auto_classify": "false"},
)
assert resp.status_code == 400
+52
View File
@@ -0,0 +1,52 @@
import pytest
from pathlib import Path
from services.extractor import extract_text
def test_extract_txt(tmp_path):
p = tmp_path / "test.txt"
p.write_text("Hello world this is a test document.", encoding="utf-8")
text = extract_text(str(p), "text/plain")
assert "Hello world" in text
def test_extract_pdf(tmp_path):
import fitz
doc = fitz.open()
page = doc.new_page()
page.insert_text((50, 50), "PDF content about legal contracts.")
pdf_path = tmp_path / "test.pdf"
doc.save(str(pdf_path))
doc.close()
text = extract_text(str(pdf_path), "application/pdf")
assert "PDF content" in text
def test_extract_docx(tmp_path):
from docx import Document
doc = Document()
doc.add_paragraph("DOCX paragraph about financial reports.")
docx_path = tmp_path / "test.docx"
doc.save(str(docx_path))
text = extract_text(
str(docx_path),
"application/vnd.openxmlformats-officedocument.wordprocessingml.document"
)
assert "DOCX paragraph" in text
def test_extract_unknown_falls_back_to_text(tmp_path):
p = tmp_path / "test.csv"
p.write_text("col1,col2\nval1,val2", encoding="utf-8")
text = extract_text(str(p), "text/csv")
assert "col1" in text
def test_extract_truncation(tmp_path):
p = tmp_path / "big.txt"
p.write_text("A" * 60_000, encoding="utf-8")
text = extract_text(str(p), "text/plain")
assert len(text) <= 50_100 # 50k + truncation marker
assert "truncated" in text
+4
View File
@@ -0,0 +1,4 @@
def test_health(client):
resp = client.get("/health")
assert resp.status_code == 200
assert resp.json() == {"status": "ok"}
+46
View File
@@ -0,0 +1,46 @@
"""
Integration test against a live LM Studio instance.
Skipped automatically if LM Studio is not reachable.
"""
import pytest
import httpx
def lmstudio_available() -> bool:
try:
r = httpx.get("http://host.docker.internal:1234/v1/models", timeout=3)
return r.status_code == 200
except Exception:
return False
@pytest.mark.skipif(not lmstudio_available(), reason="LM Studio not reachable at host.docker.internal:1234")
@pytest.mark.asyncio
async def test_lmstudio_health_check():
from ai.lmstudio_provider import LMStudioProvider
provider = LMStudioProvider(
base_url="http://host.docker.internal:1234",
model="gemma-4-e4b-it",
)
ok = await provider.health_check()
assert ok, "LM Studio health check failed"
@pytest.mark.skipif(not lmstudio_available(), reason="LM Studio not reachable at host.docker.internal:1234")
@pytest.mark.asyncio
async def test_lmstudio_classify():
from ai.lmstudio_provider import LMStudioProvider
from config import DEFAULT_SYSTEM_PROMPT
provider = LMStudioProvider(
base_url="http://host.docker.internal:1234",
model="gemma-4-e4b-it",
)
result = await provider.classify(
document_text="This document is an invoice for software development services.",
existing_topics=["Finance", "Legal", "HR"],
system_prompt=DEFAULT_SYSTEM_PROMPT,
)
# Result should have some topics assigned or suggested
assert isinstance(result.topics, list)
assert isinstance(result.suggested_new_topics, list)
+60
View File
@@ -0,0 +1,60 @@
def test_get_settings_defaults(client):
resp = client.get("/api/settings")
assert resp.status_code == 200
data = resp.json()
assert data["active_provider"] == "lmstudio"
assert "system_prompt" in data
assert "providers" in data
# API keys should be masked or empty
for prov in ("anthropic", "openai"):
key = data["providers"][prov].get("api_key", "")
assert "****" not in key or len(key) <= 8 # masked or empty
def test_patch_system_prompt(client):
new_prompt = "Custom system prompt for testing."
resp = client.patch("/api/settings", json={"system_prompt": new_prompt})
assert resp.status_code == 200
resp2 = client.get("/api/settings")
assert resp2.json()["system_prompt"] == new_prompt
def test_patch_active_provider(client):
resp = client.patch("/api/settings", json={"active_provider": "ollama"})
assert resp.status_code == 200
assert resp.json()["active_provider"] == "ollama"
def test_patch_invalid_provider(client):
resp = client.patch("/api/settings", json={"active_provider": "unknown"})
assert resp.status_code == 400
def test_patch_provider_config(client):
resp = client.patch("/api/settings", json={
"providers": {
"ollama": {"model": "mistral", "base_url": "http://host.docker.internal:11434"}
}
})
assert resp.status_code == 200
assert resp.json()["providers"]["ollama"]["model"] == "mistral"
def test_masked_api_key_not_overwritten(client):
"""Patching with a masked key should not overwrite the real stored key."""
# First set a real key
client.patch("/api/settings", json={"providers": {"anthropic": {"api_key": "sk-ant-realkey"}}})
# Then patch with masked key (simulating frontend re-submitting)
client.patch("/api/settings", json={"providers": {"anthropic": {"api_key": "****key"}}})
# The stored key should still be the real one
import services.storage as st
settings = st.load_settings()
assert settings["providers"]["anthropic"]["api_key"] == "sk-ant-realkey"
def test_get_default_prompt(client):
resp = client.get("/api/settings/default-prompt")
assert resp.status_code == 200
assert "system_prompt" in resp.json()
assert len(resp.json()["system_prompt"]) > 0
+72
View File
@@ -0,0 +1,72 @@
def test_list_topics_empty(client):
resp = client.get("/api/topics")
assert resp.status_code == 200
assert resp.json()["topics"] == []
def test_create_topic(client):
resp = client.post("/api/topics", json={"name": "Finance", "description": "Financial docs", "color": "#ff0000"})
assert resp.status_code == 200
data = resp.json()
assert data["name"] == "Finance"
assert data["color"] == "#ff0000"
assert "id" in data
def test_create_topic_deduplication(client):
client.post("/api/topics", json={"name": "Finance"})
resp = client.post("/api/topics", json={"name": "finance"}) # case-insensitive
assert resp.status_code == 200
topics = client.get("/api/topics").json()["topics"]
assert len(topics) == 1
def test_update_topic(client):
create = client.post("/api/topics", json={"name": "Old Name"}).json()
resp = client.patch(f"/api/topics/{create['id']}", json={"name": "New Name"})
assert resp.status_code == 200
assert resp.json()["name"] == "New Name"
def test_update_topic_not_found(client):
resp = client.patch("/api/topics/nonexistent", json={"name": "X"})
assert resp.status_code == 404
def test_delete_topic(client):
create = client.post("/api/topics", json={"name": "ToDelete"}).json()
resp = client.delete(f"/api/topics/{create['id']}")
assert resp.status_code == 200
assert resp.json()["success"] is True
topics = client.get("/api/topics").json()["topics"]
assert not any(t["name"] == "ToDelete" for t in topics)
def test_delete_topic_cascades_to_documents(client, sample_txt):
# Create a topic
topic = client.post("/api/topics", json={"name": "Legal"}).json()
# Upload doc (no auto classify to control topics manually)
with open(sample_txt, "rb") as f:
upload = client.post(
"/api/documents/upload",
files={"file": ("sample.txt", f, "text/plain")},
data={"auto_classify": "false"},
).json()
# Manually set topic on the document via classify endpoint
import services.storage as st
st.update_document_topics(upload["id"], ["Legal"])
# Delete topic
client.delete(f"/api/topics/{topic['id']}")
# Verify document no longer has the topic
doc = client.get(f"/api/documents/{upload['id']}").json()
assert "Legal" not in doc["topics"]
def test_delete_topic_not_found(client):
resp = client.delete("/api/topics/nonexistent")
assert resp.status_code == 404
+25
View File
@@ -0,0 +1,25 @@
services:
backend:
build: ./backend
ports:
- "8000:8000"
volumes:
- ./backend/data:/app/data
- ./backend:/app
environment:
- DATA_DIR=/app/data
- PYTHONDONTWRITEBYTECODE=1
extra_hosts:
- "host.docker.internal:host-gateway"
command: uvicorn main:app --host 0.0.0.0 --port 8000 --reload
frontend:
build: ./frontend
ports:
- "5173:5173"
volumes:
- ./frontend/src:/app/src
- ./frontend/index.html:/app/index.html
depends_on:
- backend
command: npm run dev -- --host 0.0.0.0
+10
View File
@@ -0,0 +1,10 @@
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 5173
+12
View File
@@ -0,0 +1,12 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Document Scanner</title>
</head>
<body>
<div id="app"></div>
<script type="module" src="/src/main.js"></script>
</body>
</html>
+22
View File
@@ -0,0 +1,22 @@
{
"name": "document-scanner-frontend",
"version": "1.0.0",
"type": "module",
"scripts": {
"dev": "vite",
"build": "vite build",
"preview": "vite preview"
},
"dependencies": {
"vue": "^3.4.0",
"vue-router": "^4.3.0",
"pinia": "^2.1.0"
},
"devDependencies": {
"@vitejs/plugin-vue": "^5.0.0",
"vite": "^5.2.0",
"tailwindcss": "^3.4.0",
"postcss": "^8.4.0",
"autoprefixer": "^10.4.0"
}
}
+6
View File
@@ -0,0 +1,6 @@
export default {
plugins: {
tailwindcss: {},
autoprefixer: {},
},
}
+17
View File
@@ -0,0 +1,17 @@
<template>
<div class="flex h-screen overflow-hidden">
<AppSidebar />
<main class="flex-1 overflow-y-auto">
<router-view />
</main>
</div>
</template>
<script setup>
import AppSidebar from './components/layout/AppSidebar.vue'
import { useTopicsStore } from './stores/topics.js'
import { onMounted } from 'vue'
const topicsStore = useTopicsStore()
onMounted(() => topicsStore.fetchTopics())
</script>
+105
View File
@@ -0,0 +1,105 @@
/**
* API client using native Fetch API.
* All requests go to /api (proxied to backend by Vite in dev, or nginx in prod).
*/
async function request(path, options = {}) {
const res = await fetch(path, options)
if (!res.ok) {
let msg = `HTTP ${res.status}`
try { msg = (await res.json()).detail || msg } catch {}
throw new Error(msg)
}
return res.json()
}
// ── Documents ────────────────────────────────────────────────────────────────
export function uploadDocument(file, autoClassify = true) {
const form = new FormData()
form.append('file', file)
form.append('auto_classify', autoClassify ? 'true' : 'false')
return request('/api/documents/upload', { method: 'POST', body: form })
}
export function listDocuments({ topic, page = 1, perPage = 20 } = {}) {
const params = new URLSearchParams({ page, per_page: perPage })
if (topic) params.set('topic', topic)
return request(`/api/documents?${params}`)
}
export function getDocument(id) {
return request(`/api/documents/${id}`)
}
export function deleteDocument(id) {
return request(`/api/documents/${id}`, { method: 'DELETE' })
}
export function classifyDocument(id, topics = null) {
return request(`/api/documents/${id}/classify`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(topics ? { topics } : {}),
})
}
// ── Topics ───────────────────────────────────────────────────────────────────
export function listTopics() {
return request('/api/topics')
}
export function createTopic({ name, description = '', color = '#6366f1' }) {
return request('/api/topics', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ name, description, color }),
})
}
export function updateTopic(id, patch) {
return request(`/api/topics/${id}`, {
method: 'PATCH',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(patch),
})
}
export function deleteTopic(id) {
return request(`/api/topics/${id}`, { method: 'DELETE' })
}
export function suggestTopics(documentId) {
return request('/api/topics/suggest', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ document_id: documentId }),
})
}
// ── Settings ─────────────────────────────────────────────────────────────────
export function getSettings() {
return request('/api/settings')
}
export function patchSettings(patch) {
return request('/api/settings', {
method: 'PATCH',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(patch),
})
}
export function testProvider(provider) {
return request('/api/settings/test-provider', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ provider }),
})
}
export function getDefaultPrompt() {
return request('/api/settings/default-prompt')
}
@@ -0,0 +1,59 @@
<template>
<div
class="bg-white border border-gray-200 rounded-xl p-4 hover:border-indigo-300 hover:shadow-sm transition-all cursor-pointer"
@click="$router.push(`/document/${doc.id}`)"
>
<div class="flex items-start gap-3">
<!-- Icon -->
<div class="w-9 h-9 rounded-lg bg-indigo-50 flex items-center justify-center shrink-0 mt-0.5">
<svg class="w-5 h-5 text-indigo-500" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2"
d="M9 12h6m-6 4h6m2 5H7a2 2 0 01-2-2V5a2 2 0 012-2h5.586a1 1 0 01.707.293l5.414 5.414a1 1 0 01.293.707V19a2 2 0 01-2 2z" />
</svg>
</div>
<div class="flex-1 min-w-0">
<p class="font-medium text-gray-900 text-sm truncate">{{ doc.original_name }}</p>
<p class="text-xs text-gray-400 mt-0.5">{{ formatDate(doc.created_at) }} · {{ formatSize(doc.size_bytes) }}</p>
<!-- Topics -->
<div class="flex flex-wrap gap-1 mt-2">
<TopicBadge
v-for="topicName in doc.topics"
:key="topicName"
:name="topicName"
:color="topicColor(topicName)"
/>
<span v-if="!doc.topics?.length" class="text-xs text-gray-300 italic">unclassified</span>
</div>
</div>
</div>
</div>
</template>
<script setup>
import { useTopicsStore } from '../../stores/topics.js'
import TopicBadge from '../topics/TopicBadge.vue'
const props = defineProps({
doc: Object,
})
const topicsStore = useTopicsStore()
function topicColor(name) {
return topicsStore.topics.find(t => t.name === name)?.color ?? '#6366f1'
}
function formatDate(iso) {
if (!iso) return ''
return new Date(iso).toLocaleDateString(undefined, { month: 'short', day: 'numeric', year: 'numeric' })
}
function formatSize(bytes) {
if (!bytes) return ''
if (bytes < 1024) return bytes + ' B'
if (bytes < 1024 * 1024) return (bytes / 1024).toFixed(1) + ' KB'
return (bytes / (1024 * 1024)).toFixed(1) + ' MB'
}
</script>
@@ -0,0 +1,87 @@
<template>
<aside class="w-64 bg-white border-r border-gray-200 flex flex-col h-full shrink-0">
<!-- Logo -->
<div class="px-6 py-5 border-b border-gray-100">
<h1 class="text-lg font-bold text-indigo-600 tracking-tight">DocScanner</h1>
<p class="text-xs text-gray-400 mt-0.5">AI Document Classifier</p>
</div>
<!-- Nav -->
<nav class="flex-1 px-3 py-4 overflow-y-auto">
<router-link
to="/"
class="nav-link"
:class="{ 'nav-link-active': $route.path === '/' }"
>
<svg class="w-4 h-4 mr-2 shrink-0" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2"
d="M3 12l2-2m0 0l7-7 7 7M5 10v10a1 1 0 001 1h3m10-11l2 2m-2-2v10a1 1 0 01-1 1h-3m-6 0a1 1 0 001-1v-4a1 1 0 011-1h2a1 1 0 011 1v4a1 1 0 001 1m-6 0h6" />
</svg>
Home
</router-link>
<router-link
to="/topics"
class="nav-link"
:class="{ 'nav-link-active': $route.path.startsWith('/topics') }"
>
<svg class="w-4 h-4 mr-2 shrink-0" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2"
d="M7 7h.01M7 3h5c.512 0 1.024.195 1.414.586l7 7a2 2 0 010 2.828l-7 7a2 2 0 01-2.828 0l-7-7A1.994 1.994 0 013 12V7a4 4 0 014-4z" />
</svg>
All Topics
</router-link>
<!-- Topics list -->
<div class="mt-3">
<p class="px-3 text-xs font-semibold text-gray-400 uppercase tracking-wider mb-1">Topics</p>
<div v-if="topicsStore.loading" class="px-3 py-1 text-xs text-gray-400">Loading</div>
<div v-else-if="topicsStore.topics.length === 0" class="px-3 py-1 text-xs text-gray-400">No topics yet</div>
<router-link
v-for="topic in topicsStore.topics"
:key="topic.id"
:to="`/topics/${encodeURIComponent(topic.name)}`"
class="nav-link text-sm"
:class="{ 'nav-link-active': $route.params.name === topic.name }"
>
<span
class="w-2.5 h-2.5 rounded-full mr-2 shrink-0"
:style="{ backgroundColor: topic.color }"
></span>
<span class="truncate">{{ topic.name }}</span>
<span class="ml-auto text-xs text-gray-400">{{ topic.doc_count }}</span>
</router-link>
</div>
</nav>
<!-- Settings link -->
<div class="px-3 py-4 border-t border-gray-100">
<router-link
to="/settings"
class="nav-link"
:class="{ 'nav-link-active': $route.path === '/settings' }"
>
<svg class="w-4 h-4 mr-2 shrink-0" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2"
d="M10.325 4.317c.426-1.756 2.924-1.756 3.35 0a1.724 1.724 0 002.573 1.066c1.543-.94 3.31.826 2.37 2.37a1.724 1.724 0 001.065 2.572c1.756.426 1.756 2.924 0 3.35a1.724 1.724 0 00-1.066 2.573c.94 1.543-.826 3.31-2.37 2.37a1.724 1.724 0 00-2.572 1.065c-.426 1.756-2.924 1.756-3.35 0a1.724 1.724 0 00-2.573-1.066c-1.543.94-3.31-.826-2.37-2.37a1.724 1.724 0 00-1.065-2.572c-1.756-.426-1.756-2.924 0-3.35a1.724 1.724 0 001.066-2.573c-.94-1.543.826-3.31 2.37-2.37.996.608 2.296.07 2.572-1.065z" />
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M15 12a3 3 0 11-6 0 3 3 0 016 0z" />
</svg>
Settings
</router-link>
</div>
</aside>
</template>
<script setup>
import { useTopicsStore } from '../../stores/topics.js'
const topicsStore = useTopicsStore()
</script>
<style scoped>
.nav-link {
@apply flex items-center px-3 py-2 rounded-lg text-gray-600 hover:bg-gray-100 hover:text-gray-900 transition-colors text-sm font-medium;
}
.nav-link-active {
@apply bg-indigo-50 text-indigo-700;
}
</style>
@@ -0,0 +1,15 @@
<template>
<span
class="inline-flex items-center px-2 py-0.5 rounded-full text-xs font-medium"
:style="{ backgroundColor: color + '22', color }"
>
{{ name }}
</span>
</template>
<script setup>
defineProps({
name: String,
color: { type: String, default: '#6366f1' },
})
</script>
@@ -0,0 +1,124 @@
<template>
<div>
<!-- Add form -->
<form @submit.prevent="submit" class="flex gap-2 mb-6">
<input
v-model="form.name"
type="text"
placeholder="New topic name…"
class="flex-1 border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
required
/>
<input
v-model="form.color"
type="color"
class="w-10 h-10 rounded-lg border border-gray-300 cursor-pointer p-0.5"
title="Pick color"
/>
<button
type="submit"
class="px-4 py-2 bg-indigo-600 text-white rounded-lg text-sm font-medium hover:bg-indigo-700 transition-colors"
:disabled="saving"
>
{{ saving ? 'Adding…' : 'Add' }}
</button>
</form>
<!-- Error -->
<p v-if="error" class="text-red-500 text-sm mb-4">{{ error }}</p>
<!-- Topic list -->
<div class="space-y-2">
<div
v-for="topic in topicsStore.topics"
:key="topic.id"
class="flex items-center gap-3 bg-white border border-gray-200 rounded-lg px-4 py-3"
>
<span
class="w-3 h-3 rounded-full shrink-0"
:style="{ backgroundColor: topic.color }"
></span>
<div v-if="editing === topic.id" class="flex-1 flex gap-2">
<input
v-model="editForm.name"
class="flex-1 border border-gray-300 rounded px-2 py-1 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
/>
<input v-model="editForm.color" type="color" class="w-8 h-8 rounded border border-gray-300 p-0.5" />
<input
v-model="editForm.description"
placeholder="Description"
class="flex-1 border border-gray-300 rounded px-2 py-1 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
/>
<button @click="saveEdit(topic.id)" class="text-xs text-indigo-600 font-medium">Save</button>
<button @click="editing = null" class="text-xs text-gray-400">Cancel</button>
</div>
<div v-else class="flex-1 min-w-0">
<div class="flex items-center gap-2">
<span class="font-medium text-gray-800 text-sm">{{ topic.name }}</span>
<span class="text-xs text-gray-400">({{ topic.doc_count }} docs)</span>
</div>
<p v-if="topic.description" class="text-xs text-gray-500 mt-0.5">{{ topic.description }}</p>
</div>
<div class="flex gap-2 shrink-0">
<button @click="startEdit(topic)" class="text-xs text-gray-500 hover:text-indigo-600">Edit</button>
<button @click="remove(topic)" class="text-xs text-gray-500 hover:text-red-500">Delete</button>
</div>
</div>
<div v-if="!topicsStore.topics.length" class="text-center py-8 text-gray-400 text-sm">
No topics yet. Add one above.
</div>
</div>
</div>
</template>
<script setup>
import { ref, reactive } from 'vue'
import { useTopicsStore } from '../../stores/topics.js'
const topicsStore = useTopicsStore()
const saving = ref(false)
const error = ref(null)
const editing = ref(null)
const form = reactive({ name: '', color: '#6366f1' })
const editForm = reactive({ name: '', description: '', color: '' })
async function submit() {
saving.value = true
error.value = null
try {
await topicsStore.addTopic({ name: form.name, color: form.color })
form.name = ''
form.color = '#6366f1'
} catch (e) {
error.value = e.message
} finally {
saving.value = false
}
}
function startEdit(topic) {
editing.value = topic.id
editForm.name = topic.name
editForm.description = topic.description || ''
editForm.color = topic.color
}
async function saveEdit(id) {
await topicsStore.editTopic(id, {
name: editForm.name,
description: editForm.description,
color: editForm.color,
})
editing.value = null
}
async function remove(topic) {
if (!confirm(`Delete topic "${topic.name}"? It will be removed from all documents.`)) return
await topicsStore.removeTopic(topic.id)
}
</script>
@@ -0,0 +1,62 @@
<template>
<div
class="relative border-2 border-dashed rounded-xl p-10 text-center transition-colors"
:class="dragging
? 'border-indigo-400 bg-indigo-50'
: 'border-gray-300 bg-white hover:border-indigo-300 hover:bg-gray-50'"
@dragover.prevent="dragging = true"
@dragleave.prevent="dragging = false"
@drop.prevent="onDrop"
@click="triggerInput"
>
<input
ref="inputRef"
type="file"
class="hidden"
multiple
accept=".pdf,.docx,.doc,.txt,.md,.png,.jpg,.jpeg,.tiff,.webp"
@change="onFileChange"
/>
<div class="flex flex-col items-center gap-3">
<svg class="w-12 h-12 text-gray-300" fill="none" stroke="currentColor" viewBox="0 0 24 24">
<path stroke-linecap="round" stroke-linejoin="round" stroke-width="1.5"
d="M7 16a4 4 0 01-.88-7.903A5 5 0 1115.9 6L16 6a5 5 0 011 9.9M15 13l-3-3m0 0l-3 3m3-3v12" />
</svg>
<div>
<p class="text-sm font-medium text-gray-700">Drop files here or <span class="text-indigo-600 underline cursor-pointer">browse</span></p>
<p class="text-xs text-gray-400 mt-1">PDF, DOCX, TXT, MD, PNG, JPG supported</p>
</div>
<label class="flex items-center gap-2 mt-2 cursor-pointer" @click.stop>
<input type="checkbox" v-model="autoClassify" class="rounded border-gray-300 text-indigo-600" />
<span class="text-sm text-gray-600">Auto-classify with AI after upload</span>
</label>
</div>
</div>
</template>
<script setup>
import { ref } from 'vue'
const emit = defineEmits(['files-selected'])
const dragging = ref(false)
const inputRef = ref(null)
const autoClassify = ref(true)
function triggerInput() {
inputRef.value?.click()
}
function onDrop(e) {
dragging.value = false
const files = Array.from(e.dataTransfer?.files || [])
if (files.length) emit('files-selected', { files, autoClassify: autoClassify.value })
}
function onFileChange(e) {
const files = Array.from(e.target.files || [])
if (files.length) emit('files-selected', { files, autoClassify: autoClassify.value })
e.target.value = ''
}
</script>
@@ -0,0 +1,36 @@
<template>
<div v-if="items.length" class="space-y-2 mt-4">
<div
v-for="item in items"
:key="item.name"
class="flex items-center gap-3 bg-white border border-gray-200 rounded-lg px-4 py-2.5"
>
<div class="flex-1 min-w-0">
<p class="text-sm font-medium text-gray-800 truncate">{{ item.name }}</p>
<p v-if="item.error" class="text-xs text-red-500 mt-0.5">{{ item.error }}</p>
<p v-else-if="item.done" class="text-xs text-green-600 mt-0.5">
Done{{ item.topics?.length ? ` classified as: ${item.topics.join(', ')}` : ' no topics assigned' }}
</p>
<p v-else class="text-xs text-gray-400 mt-0.5">Uploading</p>
</div>
<div class="shrink-0">
<svg v-if="item.error" class="w-5 h-5 text-red-400" fill="currentColor" viewBox="0 0 20 20">
<path fill-rule="evenodd" d="M10 18a8 8 0 100-16 8 8 0 000 16zM8.707 7.293a1 1 0 00-1.414 1.414L8.586 10l-1.293 1.293a1 1 0 101.414 1.414L10 11.414l1.293 1.293a1 1 0 001.414-1.414L11.414 10l1.293-1.293a1 1 0 00-1.414-1.414L10 8.586 8.707 7.293z" clip-rule="evenodd"/>
</svg>
<svg v-else-if="item.done" class="w-5 h-5 text-green-500" fill="currentColor" viewBox="0 0 20 20">
<path fill-rule="evenodd" d="M10 18a8 8 0 100-16 8 8 0 000 16zm3.707-9.293a1 1 0 00-1.414-1.414L9 10.586 7.707 9.293a1 1 0 00-1.414 1.414l2 2a1 1 0 001.414 0l4-4z" clip-rule="evenodd"/>
</svg>
<svg v-else class="w-5 h-5 text-indigo-400 animate-spin" fill="none" viewBox="0 0 24 24">
<circle class="opacity-25" cx="12" cy="12" r="10" stroke="currentColor" stroke-width="4"/>
<path class="opacity-75" fill="currentColor" d="M4 12a8 8 0 018-8V0C5.373 0 0 5.373 0 12h4z"/>
</svg>
</div>
</div>
</div>
</template>
<script setup>
defineProps({
items: { type: Array, default: () => [] },
})
</script>
+10
View File
@@ -0,0 +1,10 @@
import { createApp } from 'vue'
import { createPinia } from 'pinia'
import App from './App.vue'
import router from './router/index.js'
import './style.css'
const app = createApp(App)
app.use(createPinia())
app.use(router)
app.mount('#app')
+18
View File
@@ -0,0 +1,18 @@
import { createRouter, createWebHistory } from 'vue-router'
import HomeView from '../views/HomeView.vue'
import TopicsView from '../views/TopicsView.vue'
import DocumentView from '../views/DocumentView.vue'
import SettingsView from '../views/SettingsView.vue'
const routes = [
{ path: '/', component: HomeView },
{ path: '/topics', component: TopicsView },
{ path: '/topics/:name', component: TopicsView },
{ path: '/document/:id', component: DocumentView },
{ path: '/settings', component: SettingsView },
]
export default createRouter({
history: createWebHistory(),
routes,
})
+46
View File
@@ -0,0 +1,46 @@
import { defineStore } from 'pinia'
import { ref } from 'vue'
import * as api from '../api/client.js'
export const useDocumentsStore = defineStore('documents', () => {
const documents = ref([])
const total = ref(0)
const loading = ref(false)
const error = ref(null)
async function fetchDocuments({ topic, page = 1, perPage = 20 } = {}) {
loading.value = true
error.value = null
try {
const data = await api.listDocuments({ topic, page, perPage })
documents.value = data.items
total.value = data.total
} catch (e) {
error.value = e.message
} finally {
loading.value = false
}
}
async function upload(file, autoClassify = true) {
const doc = await api.uploadDocument(file, autoClassify)
documents.value.unshift(doc)
total.value++
return doc
}
async function remove(id) {
await api.deleteDocument(id)
documents.value = documents.value.filter(d => d.id !== id)
total.value--
}
async function reclassify(id, topics = null) {
const result = await api.classifyDocument(id, topics)
const idx = documents.value.findIndex(d => d.id === id)
if (idx !== -1) documents.value[idx].topics = result.topics
return result.topics
}
return { documents, total, loading, error, fetchDocuments, upload, remove, reclassify }
})
+38
View File
@@ -0,0 +1,38 @@
import { defineStore } from 'pinia'
import { ref } from 'vue'
import * as api from '../api/client.js'
export const useSettingsStore = defineStore('settings', () => {
const settings = ref(null)
const loading = ref(false)
const error = ref(null)
async function fetchSettings() {
loading.value = true
error.value = null
try {
settings.value = await api.getSettings()
} catch (e) {
error.value = e.message
} finally {
loading.value = false
}
}
async function save(patch) {
const updated = await api.patchSettings(patch)
settings.value = updated
return updated
}
async function testConnection(provider) {
return api.testProvider(provider)
}
async function resetPrompt() {
const data = await api.getDefaultPrompt()
return data.system_prompt
}
return { settings, loading, error, fetchSettings, save, testConnection, resetPrompt }
})
+42
View File
@@ -0,0 +1,42 @@
import { defineStore } from 'pinia'
import { ref } from 'vue'
import * as api from '../api/client.js'
export const useTopicsStore = defineStore('topics', () => {
const topics = ref([])
const loading = ref(false)
const error = ref(null)
async function fetchTopics() {
loading.value = true
error.value = null
try {
const data = await api.listTopics()
topics.value = data.topics
} catch (e) {
error.value = e.message
} finally {
loading.value = false
}
}
async function addTopic(payload) {
const topic = await api.createTopic(payload)
topics.value.push(topic)
return topic
}
async function editTopic(id, patch) {
const updated = await api.updateTopic(id, patch)
const idx = topics.value.findIndex(t => t.id === id)
if (idx !== -1) topics.value[idx] = updated
return updated
}
async function removeTopic(id) {
await api.deleteTopic(id)
topics.value = topics.value.filter(t => t.id !== id)
}
return { topics, loading, error, fetchTopics, addTopic, editTopic, removeTopic }
})
+9
View File
@@ -0,0 +1,9 @@
@tailwind base;
@tailwind components;
@tailwind utilities;
@layer base {
body {
@apply bg-gray-50 text-gray-900;
}
}
+184
View File
@@ -0,0 +1,184 @@
<template>
<div class="p-8 max-w-4xl mx-auto">
<!-- Back -->
<button @click="$router.back()" class="text-sm text-indigo-600 hover:underline mb-6 flex items-center gap-1">
Back
</button>
<div v-if="loading" class="text-gray-400 text-sm">Loading</div>
<div v-else-if="!doc" class="text-gray-400 text-sm">Document not found.</div>
<template v-else>
<!-- Header -->
<div class="flex items-start justify-between gap-4 mb-6">
<div>
<h2 class="text-2xl font-bold text-gray-900 break-all">{{ doc.original_name }}</h2>
<p class="text-sm text-gray-400 mt-1">
Uploaded {{ formatDate(doc.created_at) }} · {{ formatSize(doc.size_bytes) }} · {{ doc.mime_type }}
</p>
</div>
<button
@click="confirmDelete"
class="text-sm text-red-500 hover:text-red-700 shrink-0"
>Delete</button>
</div>
<!-- Topics -->
<div class="bg-white border border-gray-200 rounded-xl p-5 mb-5">
<div class="flex items-center justify-between mb-3">
<h3 class="font-semibold text-gray-800">Topics</h3>
<div class="flex gap-2">
<button
@click="reclassify"
:disabled="classifying"
class="text-xs px-3 py-1.5 bg-indigo-600 text-white rounded-lg hover:bg-indigo-700 transition-colors disabled:opacity-50"
>
{{ classifying ? 'Classifying…' : 'Re-classify' }}
</button>
<button
@click="suggestTopics"
:disabled="suggesting"
class="text-xs px-3 py-1.5 border border-gray-300 text-gray-700 rounded-lg hover:bg-gray-50 transition-colors disabled:opacity-50"
>
{{ suggesting ? 'Suggesting…' : 'Suggest Topics' }}
</button>
</div>
</div>
<div class="flex flex-wrap gap-2">
<TopicBadge
v-for="name in doc.topics"
:key="name"
:name="name"
:color="topicColor(name)"
/>
<span v-if="!doc.topics?.length" class="text-sm text-gray-400 italic">No topics assigned yet.</span>
</div>
<p v-if="classifyError" class="text-red-500 text-xs mt-2">{{ classifyError }}</p>
<!-- Suggestions modal inline -->
<div v-if="suggestions.length" class="mt-4 border-t border-gray-100 pt-4">
<p class="text-sm font-medium text-gray-700 mb-2">AI Suggestions select to create:</p>
<div class="flex flex-wrap gap-2 mb-3">
<label
v-for="s in suggestions"
:key="s"
class="flex items-center gap-1.5 cursor-pointer text-sm"
>
<input type="checkbox" v-model="selectedSuggestions" :value="s" class="rounded border-gray-300 text-indigo-600" />
{{ s }}
</label>
</div>
<div class="flex gap-2">
<button
@click="createSelectedTopics"
:disabled="!selectedSuggestions.length"
class="text-xs px-3 py-1.5 bg-indigo-600 text-white rounded-lg hover:bg-indigo-700 disabled:opacity-50"
>
Create Selected
</button>
<button @click="suggestions = []; selectedSuggestions = []" class="text-xs text-gray-500 hover:text-gray-700">
Dismiss
</button>
</div>
</div>
</div>
<!-- Extracted text -->
<div class="bg-white border border-gray-200 rounded-xl p-5">
<h3 class="font-semibold text-gray-800 mb-3">Extracted Text</h3>
<pre class="text-xs text-gray-600 whitespace-pre-wrap font-mono bg-gray-50 rounded-lg p-4 max-h-96 overflow-y-auto">{{ doc.extracted_text || '(no text extracted)' }}</pre>
</div>
</template>
</div>
</template>
<script setup>
import { ref, onMounted } from 'vue'
import { useRoute, useRouter } from 'vue-router'
import TopicBadge from '../components/topics/TopicBadge.vue'
import { useDocumentsStore } from '../stores/documents.js'
import { useTopicsStore } from '../stores/topics.js'
import * as api from '../api/client.js'
const route = useRoute()
const router = useRouter()
const docsStore = useDocumentsStore()
const topicsStore = useTopicsStore()
const doc = ref(null)
const loading = ref(true)
const classifying = ref(false)
const suggesting = ref(false)
const classifyError = ref(null)
const suggestions = ref([])
const selectedSuggestions = ref([])
onMounted(async () => {
try {
doc.value = await api.getDocument(route.params.id)
} finally {
loading.value = false
}
})
function topicColor(name) {
return topicsStore.topics.find(t => t.name === name)?.color ?? '#6366f1'
}
async function reclassify() {
classifying.value = true
classifyError.value = null
try {
const result = await api.classifyDocument(doc.value.id)
doc.value.topics = result.topics
await topicsStore.fetchTopics()
} catch (e) {
classifyError.value = e.message
} finally {
classifying.value = false
}
}
async function suggestTopics() {
suggesting.value = true
try {
const result = await api.suggestTopics(doc.value.id)
suggestions.value = result.suggested
selectedSuggestions.value = []
} catch (e) {
classifyError.value = e.message
} finally {
suggesting.value = false
}
}
async function createSelectedTopics() {
for (const name of selectedSuggestions.value) {
await topicsStore.addTopic({ name })
}
suggestions.value = []
selectedSuggestions.value = []
// Re-classify now that topics exist
await reclassify()
}
async function confirmDelete() {
if (!confirm(`Delete "${doc.value.original_name}"?`)) return
await api.deleteDocument(doc.value.id)
router.push('/')
}
function formatDate(iso) {
if (!iso) return ''
return new Date(iso).toLocaleDateString(undefined, { month: 'short', day: 'numeric', year: 'numeric' })
}
function formatSize(bytes) {
if (!bytes) return ''
if (bytes < 1024) return bytes + ' B'
if (bytes < 1024 * 1024) return (bytes / 1024).toFixed(1) + ' KB'
return (bytes / (1024 * 1024)).toFixed(1) + ' MB'
}
</script>
+63
View File
@@ -0,0 +1,63 @@
<template>
<div class="p-8 max-w-4xl mx-auto">
<h2 class="text-2xl font-bold text-gray-900 mb-1">Upload Documents</h2>
<p class="text-gray-500 text-sm mb-6">Drop files to extract text and classify them with AI.</p>
<DropZone @files-selected="onFilesSelected" />
<UploadProgress :items="uploadQueue" />
<!-- Recent documents -->
<div class="mt-10">
<div class="flex items-center justify-between mb-4">
<h3 class="text-lg font-semibold text-gray-800">Recent Documents</h3>
<span class="text-sm text-gray-400">{{ docsStore.total }} total</span>
</div>
<div v-if="docsStore.loading" class="text-sm text-gray-400">Loading</div>
<div v-else-if="docsStore.documents.length === 0" class="text-center py-12 text-gray-400">
<p class="text-sm">No documents yet. Upload one above.</p>
</div>
<div v-else class="grid gap-3">
<DocumentCard v-for="doc in docsStore.documents" :key="doc.id" :doc="doc" />
</div>
</div>
</div>
</template>
<script setup>
import { ref, onMounted } from 'vue'
import DropZone from '../components/upload/DropZone.vue'
import UploadProgress from '../components/upload/UploadProgress.vue'
import DocumentCard from '../components/documents/DocumentCard.vue'
import { useDocumentsStore } from '../stores/documents.js'
import { useTopicsStore } from '../stores/topics.js'
const docsStore = useDocumentsStore()
const topicsStore = useTopicsStore()
const uploadQueue = ref([])
onMounted(() => docsStore.fetchDocuments())
async function onFilesSelected({ files, autoClassify }) {
// Build queue items
const items = files.map(f => ({ name: f.name, done: false, error: null, topics: null }))
uploadQueue.value = [...items, ...uploadQueue.value]
for (const [i, file] of files.entries()) {
try {
const doc = await docsStore.upload(file, autoClassify)
const item = uploadQueue.value.find(q => q.name === file.name && !q.done && !q.error)
if (item) {
item.done = true
item.topics = doc.topics
}
} catch (e) {
const item = uploadQueue.value.find(q => q.name === file.name && !q.done && !q.error)
if (item) item.error = e.message
}
}
// Refresh topics (new ones may have been created)
await topicsStore.fetchTopics()
}
</script>
+223
View File
@@ -0,0 +1,223 @@
<template>
<div class="p-8 max-w-3xl mx-auto">
<h2 class="text-2xl font-bold text-gray-900 mb-1">Settings</h2>
<p class="text-gray-500 text-sm mb-8">Configure AI provider and the system prompt.</p>
<div v-if="settingsStore.loading" class="text-gray-400 text-sm">Loading</div>
<div v-else-if="!settingsStore.settings" class="text-red-500 text-sm">Failed to load settings.</div>
<template v-else>
<!-- AI Provider -->
<section class="bg-white border border-gray-200 rounded-xl p-6 mb-5">
<h3 class="font-semibold text-gray-800 mb-4">AI Provider</h3>
<div class="flex flex-wrap gap-2 mb-6">
<button
v-for="prov in providers"
:key="prov.id"
@click="activeProvider = prov.id"
class="px-4 py-2 rounded-lg text-sm font-medium border transition-colors"
:class="activeProvider === prov.id
? 'bg-indigo-600 text-white border-indigo-600'
: 'border-gray-300 text-gray-600 hover:bg-gray-50'"
>
{{ prov.label }}
</button>
</div>
<!-- Anthropic config -->
<div v-if="activeProvider === 'anthropic'" class="space-y-3">
<label class="block text-sm font-medium text-gray-700">API Key</label>
<input
v-model="providerCfg.anthropic.api_key"
type="password"
placeholder="sk-ant-…"
class="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
/>
<label class="block text-sm font-medium text-gray-700 mt-3">Model</label>
<input
v-model="providerCfg.anthropic.model"
class="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
/>
</div>
<!-- OpenAI config -->
<div v-else-if="activeProvider === 'openai'" class="space-y-3">
<label class="block text-sm font-medium text-gray-700">API Key</label>
<input
v-model="providerCfg.openai.api_key"
type="password"
placeholder="sk-…"
class="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
/>
<label class="block text-sm font-medium text-gray-700 mt-3">Model</label>
<input
v-model="providerCfg.openai.model"
class="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
/>
<label class="block text-sm font-medium text-gray-700 mt-3">Base URL (optional)</label>
<input
v-model="providerCfg.openai.base_url"
placeholder="https://api.openai.com/v1"
class="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
/>
</div>
<!-- Ollama config -->
<div v-else-if="activeProvider === 'ollama'" class="space-y-3">
<label class="block text-sm font-medium text-gray-700">Base URL</label>
<input
v-model="providerCfg.ollama.base_url"
class="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
/>
<label class="block text-sm font-medium text-gray-700 mt-3">Model</label>
<input
v-model="providerCfg.ollama.model"
class="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
/>
<p class="text-xs text-gray-400 mt-1">
Ollama must be started with <code class="bg-gray-100 px-1 rounded">OLLAMA_HOST=0.0.0.0 ollama serve</code>
</p>
</div>
<!-- LM Studio config -->
<div v-else-if="activeProvider === 'lmstudio'" class="space-y-3">
<label class="block text-sm font-medium text-gray-700">Base URL</label>
<input
v-model="providerCfg.lmstudio.base_url"
class="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
/>
<label class="block text-sm font-medium text-gray-700 mt-3">Model</label>
<input
v-model="providerCfg.lmstudio.model"
class="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400"
/>
<p class="text-xs text-gray-400 mt-1">
LM Studio server must be bound to <code class="bg-gray-100 px-1 rounded">0.0.0.0</code> in LM Studio settings.
</p>
</div>
<!-- Test connection -->
<div class="flex items-center gap-3 mt-5">
<button
@click="testConn"
:disabled="testing"
class="text-sm px-4 py-2 border border-gray-300 rounded-lg hover:bg-gray-50 transition-colors disabled:opacity-50"
>
{{ testing ? 'Testing…' : 'Test Connection' }}
</button>
<span v-if="testResult" :class="testResult.ok ? 'text-green-600' : 'text-red-500'" class="text-sm">
{{ testResult.ok ? '' : '' }} {{ testResult.message }}
<span v-if="testResult.ok && testResult.latency_ms" class="text-gray-400">({{ testResult.latency_ms }}ms)</span>
</span>
</div>
</section>
<!-- System Prompt -->
<section class="bg-white border border-gray-200 rounded-xl p-6 mb-5">
<div class="flex items-center justify-between mb-3">
<h3 class="font-semibold text-gray-800">System Prompt</h3>
<button @click="resetPrompt" class="text-xs text-indigo-600 hover:underline">Reset to default</button>
</div>
<textarea
v-model="systemPrompt"
rows="8"
class="w-full border border-gray-300 rounded-lg px-3 py-2 text-sm font-mono focus:outline-none focus:ring-2 focus:ring-indigo-400 resize-y"
></textarea>
</section>
<!-- Save -->
<div class="flex items-center gap-3">
<button
@click="save"
:disabled="saving"
class="px-6 py-2.5 bg-indigo-600 text-white rounded-lg text-sm font-medium hover:bg-indigo-700 transition-colors disabled:opacity-50"
>
{{ saving ? 'Saving…' : 'Save Settings' }}
</button>
<span v-if="saveMsg" :class="saveError ? 'text-red-500' : 'text-green-600'" class="text-sm">
{{ saveMsg }}
</span>
</div>
</template>
</div>
</template>
<script setup>
import { ref, reactive, watch, onMounted } from 'vue'
import { useSettingsStore } from '../stores/settings.js'
const settingsStore = useSettingsStore()
const saving = ref(false)
const testing = ref(false)
const testResult = ref(null)
const saveMsg = ref('')
const saveError = ref(false)
const providers = [
{ id: 'lmstudio', label: 'LM Studio' },
{ id: 'ollama', label: 'Ollama' },
{ id: 'openai', label: 'OpenAI' },
{ id: 'anthropic', label: 'Anthropic' },
]
const activeProvider = ref('lmstudio')
const systemPrompt = ref('')
const providerCfg = reactive({
anthropic: { api_key: '', model: 'claude-sonnet-4-6' },
openai: { api_key: '', model: 'gpt-4o', base_url: '' },
ollama: { base_url: 'http://host.docker.internal:11434', model: 'llama3.2' },
lmstudio: { base_url: 'http://host.docker.internal:1234', model: 'gemma-4-e4b-it' },
})
onMounted(async () => {
await settingsStore.fetchSettings()
populateForm()
})
function populateForm() {
const s = settingsStore.settings
if (!s) return
activeProvider.value = s.active_provider
systemPrompt.value = s.system_prompt
for (const [k, v] of Object.entries(s.providers || {})) {
if (providerCfg[k]) Object.assign(providerCfg[k], v)
}
}
async function testConn() {
testing.value = true
testResult.value = null
try {
testResult.value = await settingsStore.testConnection(activeProvider.value)
} catch (e) {
testResult.value = { ok: false, message: e.message, latency_ms: 0 }
} finally {
testing.value = false
}
}
async function resetPrompt() {
systemPrompt.value = await settingsStore.resetPrompt()
}
async function save() {
saving.value = true
saveMsg.value = ''
saveError.value = false
try {
await settingsStore.save({
system_prompt: systemPrompt.value,
active_provider: activeProvider.value,
providers: providerCfg,
})
saveMsg.value = 'Settings saved.'
} catch (e) {
saveMsg.value = e.message
saveError.value = true
} finally {
saving.value = false
setTimeout(() => saveMsg.value = '', 3000)
}
}
</script>
+82
View File
@@ -0,0 +1,82 @@
<template>
<div class="p-8 max-w-4xl mx-auto">
<!-- Header -->
<div class="flex items-center justify-between mb-6">
<div>
<h2 class="text-2xl font-bold text-gray-900">
{{ activeTopic ? activeTopic : 'All Topics' }}
</h2>
<p class="text-gray-500 text-sm mt-0.5">
{{ activeTopic ? `Documents classified under "${activeTopic}"` : 'Manage topics and browse documents by topic' }}
</p>
</div>
<button
v-if="activeTopic"
@click="$router.push('/topics')"
class="text-sm text-indigo-600 hover:underline"
>
All Topics
</button>
</div>
<!-- No filter: show topic manager + topic grid -->
<template v-if="!activeTopic">
<TopicManager />
<div class="mt-8">
<h3 class="text-lg font-semibold text-gray-800 mb-4">Browse by Topic</h3>
<div v-if="topicsStore.topics.length === 0" class="text-sm text-gray-400">No topics yet.</div>
<div v-else class="grid grid-cols-2 sm:grid-cols-3 gap-3">
<router-link
v-for="topic in topicsStore.topics"
:key="topic.id"
:to="`/topics/${encodeURIComponent(topic.name)}`"
class="bg-white border border-gray-200 rounded-xl p-4 hover:border-indigo-300 hover:shadow-sm transition-all"
>
<div class="flex items-center gap-2 mb-2">
<span class="w-3 h-3 rounded-full" :style="{ backgroundColor: topic.color }"></span>
<span class="font-medium text-gray-800 text-sm">{{ topic.name }}</span>
</div>
<p class="text-2xl font-bold text-gray-900">{{ topic.doc_count }}</p>
<p class="text-xs text-gray-400">document{{ topic.doc_count !== 1 ? 's' : '' }}</p>
</router-link>
</div>
</div>
</template>
<!-- Filtered by topic: document list -->
<template v-else>
<div v-if="docsStore.loading" class="text-sm text-gray-400">Loading</div>
<div v-else-if="docsStore.documents.length === 0" class="text-center py-12 text-gray-400">
No documents under this topic yet.
</div>
<div v-else class="grid gap-3">
<DocumentCard v-for="doc in docsStore.documents" :key="doc.id" :doc="doc" />
</div>
</template>
</div>
</template>
<script setup>
import { computed, watch, onMounted } from 'vue'
import { useRoute } from 'vue-router'
import TopicManager from '../components/topics/TopicManager.vue'
import DocumentCard from '../components/documents/DocumentCard.vue'
import { useTopicsStore } from '../stores/topics.js'
import { useDocumentsStore } from '../stores/documents.js'
const route = useRoute()
const topicsStore = useTopicsStore()
const docsStore = useDocumentsStore()
const activeTopic = computed(() => route.params.name ? decodeURIComponent(route.params.name) : null)
function loadDocs() {
if (activeTopic.value) {
docsStore.fetchDocuments({ topic: activeTopic.value })
}
}
onMounted(loadDocs)
watch(activeTopic, loadDocs)
</script>
+8
View File
@@ -0,0 +1,8 @@
/** @type {import('tailwindcss').Config} */
export default {
content: ['./index.html', './src/**/*.{vue,js}'],
theme: {
extend: {},
},
plugins: [],
}
+16
View File
@@ -0,0 +1,16 @@
import { defineConfig } from 'vite'
import vue from '@vitejs/plugin-vue'
export default defineConfig({
plugins: [vue()],
server: {
host: '0.0.0.0',
port: 5173,
proxy: {
'/api': {
target: 'http://backend:8000',
changeOrigin: true,
},
},
},
})