kite/.planning/codebase/TESTING.md

# TESTING — document-scanner

_Last updated: 2026-05-21_

## Summary

The backend has solid integration test coverage across all API surfaces and services using pytest + FastAPI TestClient. Each test runs in a fully isolated temporary data directory, so there is no shared state between tests. The frontend has no test framework configured at all.

---

## Backend Testing

### Framework
- **pytest** + **pytest-asyncio** (`asyncio_mode = auto` in `pytest.ini`)
- **FastAPI TestClient** (synchronous ASGI test client from `httpx`)
- No mocking library — AI calls are either tested with real parsing logic or the AI layer is swapped via provider mocking

### Test Isolation Strategy (conftest.py)
- `isolated_data_dir` fixture is `autouse=True` — every test automatically gets:
  - A fresh `tmp_path/data/` directory with `uploads/`, `metadata/`
  - Clean `topics.json` and `settings.json` initialized from `DEFAULT_SETTINGS`
  - Monkeypatched `DATA_DIR` env var and all module-level path constants in `config` and `services.storage`
  - New `FileLock` instances pointing to the tmp dir
- `client` fixture wraps FastAPI `TestClient` with the isolated data dir active

### Test Files

| File | What it covers |
|---|---|
| `test_health.py` | `GET /health` returns `{"status": "ok"}` |
| `test_documents.py` | Upload TXT/PDF (no-classify), list, get, delete; extracts text correctly |
| `test_topics.py` | Create, list, delete topics via API |
| `test_settings.py` | Read default settings, update provider config |
| `test_extractor.py` | Unit tests for `extract_text()` on TXT, PDF, DOCX, image paths |
| `test_classifier.py` | Unit tests for JSON parsing helpers (`_parse_classification`, `_parse_suggestions`, `_strip_code_fences`) — no real AI calls |
| `test_lmstudio.py` | LMStudio provider-specific behaviour (likely mocked or uses a local endpoint) |

### Fixtures Available

| Fixture | Provides |
|---|---|
| `isolated_data_dir` | Autouse — clean tmp data dir |
| `client` | FastAPI TestClient with isolated data |
| `sample_txt` | A `.txt` file with test content |
| `sample_pdf` | A minimal valid PDF created with PyMuPDF |

### What Is NOT Tested

- Auto-classification flow end-to-end (requires a live AI provider)
- Document reclassify endpoint
- Anthropic, OpenAI, Ollama provider implementations directly
- Any concurrent write / filelock contention scenarios
- File size / type validation edge cases
- Frontend — no tests exist

---

## Frontend Testing

- **No test framework installed** — `package.json` has no `vitest`, `jest`, or `@testing-library/vue`
- No test files found under `frontend/src/`
- No Cypress or Playwright configuration

---

## Running Tests

```bash
# From backend/
pytest

# With verbose output
pytest -v

# Single file
pytest tests/test_documents.py
```

---

## Gaps / Unknowns

- No test coverage measurement (no `pytest-cov` in `requirements.txt`)
- `test_lmstudio.py` content not inspected — unclear if it hits a real local endpoint
- No CI configuration (no GitHub Actions, no Dockerfile for test runner)
- No snapshot or contract tests for API response shapes
- Frontend is completely untested