Files
Business-Management/changelog/2026-04-14_doc-service.md
T
curo1305 0d34867a69 Add PDF document service with AI extraction and per-app settings
- New `features/doc-service` FastAPI microservice: PDF upload, async
  text extraction (pdfplumber), AI classification via Anthropic/Ollama/
  LM Studio, per-user categories, file download
- Alembic migration isolated with `alembic_version_doc_service` table
- Main backend: httpx proxy routers for /api/documents/* and
  /api/documents/categories/*, admin settings API at /api/settings/*
- Runtime config in /config/doc_service_config.json (shared Docker
  volume); api_key masking on reads; atomic write with os.replace()
- Frontend: DocumentsPage, DocumentAdminSettingsPage, updated AppsPage
  launcher hub, simplified Nav (removed Settings link), new routes
- docker-compose: doc-service service, doc_data + app_config volumes,
  removed internal:true from backend-net for outbound AI API calls
- Fix pre-commit hook: probe Docker socket path so git subprocess picks
  up Docker Desktop on macOS
- Fix security_check.py: use sys.executable for bandit so venv python
  is used instead of system python

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 05:28:11 +02:00

59 lines
4.2 KiB
Markdown

# 2026-04-14 — PDF Document Service
**Timestamp:** 2026-04-14T00:00:00+00:00
## Summary
Added `features/doc-service` — a FastAPI microservice that accepts PDF uploads, extracts text with pdfplumber, and uses a pluggable AI provider (Anthropic, Ollama, or LM Studio) to classify and extract structured data. Integrated it into the main backend via httpx proxy routers. Added an admin settings UI at `/apps/documents/settings/admin`. Updated the frontend route tree, Nav, and AppsPage.
## Files Added
- `features/doc-service/Dockerfile` — UID 1001, pre-creates `/data/documents` and `/config`
- `features/doc-service/pyproject.toml` — service dependencies
- `features/doc-service/alembic.ini` — separate `alembic_version_doc_service` table
- `features/doc-service/.env.example`
- `features/doc-service/scripts/start.sh` — migrations + uvicorn
- `features/doc-service/scripts/start_dev.sh` — migrations + uvicorn --reload
- `features/doc-service/alembic/env.py` — async migrations, VERSION_TABLE isolation
- `features/doc-service/alembic/versions/0001_create_doc_tables.py` — documents, document_categories, document_category_assignments
- `features/doc-service/app/main.py` — no CORS (internal service)
- `features/doc-service/app/core/config.py` — DATABASE_URL, DATA_DIR, CONFIG_PATH settings
- `features/doc-service/app/database.py` — async engine, AsyncSessionLocal, Base
- `features/doc-service/app/deps.py` — get_user_id from X-User-Id header
- `features/doc-service/app/models/document.py` — Document ORM model
- `features/doc-service/app/models/category.py` — DocumentCategory ORM model
- `features/doc-service/app/models/category_assignment.py` — CategoryAssignment composite PK
- `features/doc-service/app/models/__init__.py`
- `features/doc-service/app/schemas/document.py` — DocumentOut, DocumentStatusOut, DocumentTypeUpdate, CategoryOut
- `features/doc-service/app/schemas/category.py` — CategoryCreate, CategoryOut, CategoryUpdate
- `features/doc-service/app/routers/documents.py` — upload, list, get, status, patch type, delete, file download, category assignment
- `features/doc-service/app/routers/categories.py` — CRUD for DocumentCategory
- `features/doc-service/app/services/storage.py` — aiofiles write, path helpers, delete
- `features/doc-service/app/services/config_reader.py` — load_doc_config() with 30s TTL cache
- `features/doc-service/app/services/ai/__init__.py` — get_provider() factory
- `features/doc-service/app/services/ai/base.py` — AIProvider ABC, shared prompts
- `features/doc-service/app/services/ai/anthropic_provider.py` — AnthropicProvider
- `features/doc-service/app/services/ai/openai_compat.py` — OpenAICompatProvider (Ollama + LM Studio)
- `backend/app/core/app_config.py` — DocServiceConfig Pydantic model, load/save with atomic write, api_key masking
- `backend/app/routers/settings.py` — GET/PATCH /api/settings/documents/*, admin only
- `backend/app/routers/documents_proxy.py` — httpx proxy to doc-service /documents/*
- `backend/app/routers/categories_proxy.py` — httpx proxy to doc-service /categories/*
- `frontend/src/pages/DocumentsPage.tsx` — upload, list, status polling, categories, file download
- `frontend/src/pages/DocumentAdminSettingsPage.tsx` — AI provider config, connection test, upload limits
## Files Modified
- `backend/app/main.py` — registered settings_router, categories_proxy (before!), documents_proxy
- `backend/pyproject.toml` — moved httpx to main deps, added anthropic>=0.28, openai>=1.0
- `frontend/src/App.tsx` — added /apps/documents and /apps/documents/settings/admin routes, removed /settings
- `frontend/src/components/Nav.tsx` — removed Settings link, added Profile link, logo links to /
- `frontend/src/pages/AppsPage.tsx` — replaced stub with app launcher card grid
- `frontend/src/api/client.ts` — added documents, categories, and settings API functions
- `docker-compose.yml` — added doc-service service, doc_data + app_config volumes, removed internal:true from backend-net, added app_config volume to backend
- `docker-compose.dev.yml` — added doc-service dev override with --reload
- `TODO.md` — added PDF Documents app section
## Files Deleted
- `frontend/src/pages/SettingsPage.tsx` — stub replaced by per-app settings pages