Files
Business-Management/changelog/2026-04-14_doc-service.md
T
curo1305 0d34867a69 Add PDF document service with AI extraction and per-app settings
- New `features/doc-service` FastAPI microservice: PDF upload, async
  text extraction (pdfplumber), AI classification via Anthropic/Ollama/
  LM Studio, per-user categories, file download
- Alembic migration isolated with `alembic_version_doc_service` table
- Main backend: httpx proxy routers for /api/documents/* and
  /api/documents/categories/*, admin settings API at /api/settings/*
- Runtime config in /config/doc_service_config.json (shared Docker
  volume); api_key masking on reads; atomic write with os.replace()
- Frontend: DocumentsPage, DocumentAdminSettingsPage, updated AppsPage
  launcher hub, simplified Nav (removed Settings link), new routes
- docker-compose: doc-service service, doc_data + app_config volumes,
  removed internal:true from backend-net for outbound AI API calls
- Fix pre-commit hook: probe Docker socket path so git subprocess picks
  up Docker Desktop on macOS
- Fix security_check.py: use sys.executable for bandit so venv python
  is used instead of system python

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 05:28:11 +02:00

4.2 KiB

2026-04-14 — PDF Document Service

Timestamp: 2026-04-14T00:00:00+00:00

Summary

Added features/doc-service — a FastAPI microservice that accepts PDF uploads, extracts text with pdfplumber, and uses a pluggable AI provider (Anthropic, Ollama, or LM Studio) to classify and extract structured data. Integrated it into the main backend via httpx proxy routers. Added an admin settings UI at /apps/documents/settings/admin. Updated the frontend route tree, Nav, and AppsPage.

Files Added

  • features/doc-service/Dockerfile — UID 1001, pre-creates /data/documents and /config
  • features/doc-service/pyproject.toml — service dependencies
  • features/doc-service/alembic.ini — separate alembic_version_doc_service table
  • features/doc-service/.env.example
  • features/doc-service/scripts/start.sh — migrations + uvicorn
  • features/doc-service/scripts/start_dev.sh — migrations + uvicorn --reload
  • features/doc-service/alembic/env.py — async migrations, VERSION_TABLE isolation
  • features/doc-service/alembic/versions/0001_create_doc_tables.py — documents, document_categories, document_category_assignments
  • features/doc-service/app/main.py — no CORS (internal service)
  • features/doc-service/app/core/config.py — DATABASE_URL, DATA_DIR, CONFIG_PATH settings
  • features/doc-service/app/database.py — async engine, AsyncSessionLocal, Base
  • features/doc-service/app/deps.py — get_user_id from X-User-Id header
  • features/doc-service/app/models/document.py — Document ORM model
  • features/doc-service/app/models/category.py — DocumentCategory ORM model
  • features/doc-service/app/models/category_assignment.py — CategoryAssignment composite PK
  • features/doc-service/app/models/__init__.py
  • features/doc-service/app/schemas/document.py — DocumentOut, DocumentStatusOut, DocumentTypeUpdate, CategoryOut
  • features/doc-service/app/schemas/category.py — CategoryCreate, CategoryOut, CategoryUpdate
  • features/doc-service/app/routers/documents.py — upload, list, get, status, patch type, delete, file download, category assignment
  • features/doc-service/app/routers/categories.py — CRUD for DocumentCategory
  • features/doc-service/app/services/storage.py — aiofiles write, path helpers, delete
  • features/doc-service/app/services/config_reader.py — load_doc_config() with 30s TTL cache
  • features/doc-service/app/services/ai/__init__.py — get_provider() factory
  • features/doc-service/app/services/ai/base.py — AIProvider ABC, shared prompts
  • features/doc-service/app/services/ai/anthropic_provider.py — AnthropicProvider
  • features/doc-service/app/services/ai/openai_compat.py — OpenAICompatProvider (Ollama + LM Studio)
  • backend/app/core/app_config.py — DocServiceConfig Pydantic model, load/save with atomic write, api_key masking
  • backend/app/routers/settings.py — GET/PATCH /api/settings/documents/*, admin only
  • backend/app/routers/documents_proxy.py — httpx proxy to doc-service /documents/*
  • backend/app/routers/categories_proxy.py — httpx proxy to doc-service /categories/*
  • frontend/src/pages/DocumentsPage.tsx — upload, list, status polling, categories, file download
  • frontend/src/pages/DocumentAdminSettingsPage.tsx — AI provider config, connection test, upload limits

Files Modified

  • backend/app/main.py — registered settings_router, categories_proxy (before!), documents_proxy
  • backend/pyproject.toml — moved httpx to main deps, added anthropic>=0.28, openai>=1.0
  • frontend/src/App.tsx — added /apps/documents and /apps/documents/settings/admin routes, removed /settings
  • frontend/src/components/Nav.tsx — removed Settings link, added Profile link, logo links to /
  • frontend/src/pages/AppsPage.tsx — replaced stub with app launcher card grid
  • frontend/src/api/client.ts — added documents, categories, and settings API functions
  • docker-compose.yml — added doc-service service, doc_data + app_config volumes, removed internal:true from backend-net, added app_config volume to backend
  • docker-compose.dev.yml — added doc-service dev override with --reload
  • TODO.md — added PDF Documents app section

Files Deleted

  • frontend/src/pages/SettingsPage.tsx — stub replaced by per-app settings pages