chore: initial commit — existing single-user document scanner codebase

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 08:53:28 +02:00
parent 6fed5ba531
commit 7a34807fa0
71 changed files with 16408 additions and 0 deletions
@@ -0,0 +1,87 @@
+# TESTING — document-scanner
+
+_Last updated: 2026-05-21_
+
+## Summary
+
+The backend has solid integration test coverage across all API surfaces and services using pytest + FastAPI TestClient. Each test runs in a fully isolated temporary data directory, so there is no shared state between tests. The frontend has no test framework configured at all.
+
+---
+
+## Backend Testing
+
+### Framework
+- **pytest** + **pytest-asyncio** (`asyncio_mode = auto` in `pytest.ini`)
+- **FastAPI TestClient** (synchronous ASGI test client from `httpx`)
+- No mocking library — AI calls are either tested with real parsing logic or the AI layer is swapped via provider mocking
+
+### Test Isolation Strategy (conftest.py)
+- `isolated_data_dir` fixture is `autouse=True` — every test automatically gets:
+  - A fresh `tmp_path/data/` directory with `uploads/`, `metadata/`
+  - Clean `topics.json` and `settings.json` initialized from `DEFAULT_SETTINGS`
+  - Monkeypatched `DATA_DIR` env var and all module-level path constants in `config` and `services.storage`
+  - New `FileLock` instances pointing to the tmp dir
+- `client` fixture wraps FastAPI `TestClient` with the isolated data dir active
+
+### Test Files
+
+| File | What it covers |
+|---|---|
+| `test_health.py` | `GET /health` returns `{"status": "ok"}` |
+| `test_documents.py` | Upload TXT/PDF (no-classify), list, get, delete; extracts text correctly |
+| `test_topics.py` | Create, list, delete topics via API |
+| `test_settings.py` | Read default settings, update provider config |
+| `test_extractor.py` | Unit tests for `extract_text()` on TXT, PDF, DOCX, image paths |
+| `test_classifier.py` | Unit tests for JSON parsing helpers (`_parse_classification`, `_parse_suggestions`, `_strip_code_fences`) — no real AI calls |
+| `test_lmstudio.py` | LMStudio provider-specific behaviour (likely mocked or uses a local endpoint) |
+
+### Fixtures Available
+
+| Fixture | Provides |
+|---|---|
+| `isolated_data_dir` | Autouse — clean tmp data dir |
+| `client` | FastAPI TestClient with isolated data |
+| `sample_txt` | A `.txt` file with test content |
+| `sample_pdf` | A minimal valid PDF created with PyMuPDF |
+
+### What Is NOT Tested
+
+- Auto-classification flow end-to-end (requires a live AI provider)
+- Document reclassify endpoint
+- Anthropic, OpenAI, Ollama provider implementations directly
+- Any concurrent write / filelock contention scenarios
+- File size / type validation edge cases
+- Frontend — no tests exist
+
+---
+
+## Frontend Testing
+
+- **No test framework installed** — `package.json` has no `vitest`, `jest`, or `@testing-library/vue`
+- No test files found under `frontend/src/`
+- No Cypress or Playwright configuration
+
+---
+
+## Running Tests
+
+```bash
+# From backend/
+pytest
+
+# With verbose output
+pytest -v
+
+# Single file
+pytest tests/test_documents.py
+```
+
+---
+
+## Gaps / Unknowns
+
+- No test coverage measurement (no `pytest-cov` in `requirements.txt`)
+- `test_lmstudio.py` content not inspected — unclear if it hits a real local endpoint
+- No CI configuration (no GitHub Actions, no Dockerfile for test runner)
+- No snapshot or contract tests for API response shapes
+- Frontend is completely untested