Add PDF document service with AI extraction and per-app settings

- New `features/doc-service` FastAPI microservice: PDF upload, async
  text extraction (pdfplumber), AI classification via Anthropic/Ollama/
  LM Studio, per-user categories, file download
- Alembic migration isolated with `alembic_version_doc_service` table
- Main backend: httpx proxy routers for /api/documents/* and
  /api/documents/categories/*, admin settings API at /api/settings/*
- Runtime config in /config/doc_service_config.json (shared Docker
  volume); api_key masking on reads; atomic write with os.replace()
- Frontend: DocumentsPage, DocumentAdminSettingsPage, updated AppsPage
  launcher hub, simplified Nav (removed Settings link), new routes
- docker-compose: doc-service service, doc_data + app_config volumes,
  removed internal:true from backend-net for outbound AI API calls
- Fix pre-commit hook: probe Docker socket path so git subprocess picks
  up Docker Desktop on macOS
- Fix security_check.py: use sys.executable for bandit so venv python
  is used instead of system python

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
curo1305
2026-04-14 05:28:11 +02:00
parent d423bea134
commit 0d34867a69
52 changed files with 2500 additions and 28 deletions
+28 -2
View File
@@ -30,6 +30,30 @@ services:
env_file: ./backend/.env
environment:
DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-postgres}:${POSTGRES_PASSWORD:-password}@db:5432/${POSTGRES_DB:-destroying_sap}
DOC_SERVICE_URL: http://doc-service:8001
volumes:
- app_config:/config
depends_on:
db:
condition: service_healthy
networks:
- backend-net
# ── Doc service (PDF extraction) ────────────────────────────────────────────
doc-service:
build:
context: ./features/doc-service
dockerfile: Dockerfile
network: host
user: "1001:1001"
restart: unless-stopped
environment:
DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-postgres}:${POSTGRES_PASSWORD:-password}@db:5432/${POSTGRES_DB:-destroying_sap}
DATA_DIR: /data/documents
CONFIG_PATH: /config/doc_service_config.json
volumes:
- doc_data:/data/documents
- app_config:/config
depends_on:
db:
condition: service_healthy
@@ -54,10 +78,12 @@ services:
volumes:
postgres_data:
doc_data: # PDF files persisted across restarts
app_config: # Per-service runtime config JSON files
networks:
# Internal-only: db ↔ backend ↔ frontend reverse proxy. No host routing.
# backend-net: db ↔ backend ↔ doc-service. No host ports bound.
# internal:true removed — doc-service needs outbound access for cloud AI providers.
backend-net:
internal: true
# External-facing: only the frontend binds a host port through this network.
frontend-net: