Add PDF document service with AI extraction and per-app settings
- New `features/doc-service` FastAPI microservice: PDF upload, async text extraction (pdfplumber), AI classification via Anthropic/Ollama/ LM Studio, per-user categories, file download - Alembic migration isolated with `alembic_version_doc_service` table - Main backend: httpx proxy routers for /api/documents/* and /api/documents/categories/*, admin settings API at /api/settings/* - Runtime config in /config/doc_service_config.json (shared Docker volume); api_key masking on reads; atomic write with os.replace() - Frontend: DocumentsPage, DocumentAdminSettingsPage, updated AppsPage launcher hub, simplified Nav (removed Settings link), new routes - docker-compose: doc-service service, doc_data + app_config volumes, removed internal:true from backend-net for outbound AI API calls - Fix pre-commit hook: probe Docker socket path so git subprocess picks up Docker Desktop on macOS - Fix security_check.py: use sys.executable for bandit so venv python is used instead of system python Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,58 @@
|
||||
# 2026-04-14 — PDF Document Service
|
||||
|
||||
**Timestamp:** 2026-04-14T00:00:00+00:00
|
||||
|
||||
## Summary
|
||||
|
||||
Added `features/doc-service` — a FastAPI microservice that accepts PDF uploads, extracts text with pdfplumber, and uses a pluggable AI provider (Anthropic, Ollama, or LM Studio) to classify and extract structured data. Integrated it into the main backend via httpx proxy routers. Added an admin settings UI at `/apps/documents/settings/admin`. Updated the frontend route tree, Nav, and AppsPage.
|
||||
|
||||
## Files Added
|
||||
|
||||
- `features/doc-service/Dockerfile` — UID 1001, pre-creates `/data/documents` and `/config`
|
||||
- `features/doc-service/pyproject.toml` — service dependencies
|
||||
- `features/doc-service/alembic.ini` — separate `alembic_version_doc_service` table
|
||||
- `features/doc-service/.env.example`
|
||||
- `features/doc-service/scripts/start.sh` — migrations + uvicorn
|
||||
- `features/doc-service/scripts/start_dev.sh` — migrations + uvicorn --reload
|
||||
- `features/doc-service/alembic/env.py` — async migrations, VERSION_TABLE isolation
|
||||
- `features/doc-service/alembic/versions/0001_create_doc_tables.py` — documents, document_categories, document_category_assignments
|
||||
- `features/doc-service/app/main.py` — no CORS (internal service)
|
||||
- `features/doc-service/app/core/config.py` — DATABASE_URL, DATA_DIR, CONFIG_PATH settings
|
||||
- `features/doc-service/app/database.py` — async engine, AsyncSessionLocal, Base
|
||||
- `features/doc-service/app/deps.py` — get_user_id from X-User-Id header
|
||||
- `features/doc-service/app/models/document.py` — Document ORM model
|
||||
- `features/doc-service/app/models/category.py` — DocumentCategory ORM model
|
||||
- `features/doc-service/app/models/category_assignment.py` — CategoryAssignment composite PK
|
||||
- `features/doc-service/app/models/__init__.py`
|
||||
- `features/doc-service/app/schemas/document.py` — DocumentOut, DocumentStatusOut, DocumentTypeUpdate, CategoryOut
|
||||
- `features/doc-service/app/schemas/category.py` — CategoryCreate, CategoryOut, CategoryUpdate
|
||||
- `features/doc-service/app/routers/documents.py` — upload, list, get, status, patch type, delete, file download, category assignment
|
||||
- `features/doc-service/app/routers/categories.py` — CRUD for DocumentCategory
|
||||
- `features/doc-service/app/services/storage.py` — aiofiles write, path helpers, delete
|
||||
- `features/doc-service/app/services/config_reader.py` — load_doc_config() with 30s TTL cache
|
||||
- `features/doc-service/app/services/ai/__init__.py` — get_provider() factory
|
||||
- `features/doc-service/app/services/ai/base.py` — AIProvider ABC, shared prompts
|
||||
- `features/doc-service/app/services/ai/anthropic_provider.py` — AnthropicProvider
|
||||
- `features/doc-service/app/services/ai/openai_compat.py` — OpenAICompatProvider (Ollama + LM Studio)
|
||||
- `backend/app/core/app_config.py` — DocServiceConfig Pydantic model, load/save with atomic write, api_key masking
|
||||
- `backend/app/routers/settings.py` — GET/PATCH /api/settings/documents/*, admin only
|
||||
- `backend/app/routers/documents_proxy.py` — httpx proxy to doc-service /documents/*
|
||||
- `backend/app/routers/categories_proxy.py` — httpx proxy to doc-service /categories/*
|
||||
- `frontend/src/pages/DocumentsPage.tsx` — upload, list, status polling, categories, file download
|
||||
- `frontend/src/pages/DocumentAdminSettingsPage.tsx` — AI provider config, connection test, upload limits
|
||||
|
||||
## Files Modified
|
||||
|
||||
- `backend/app/main.py` — registered settings_router, categories_proxy (before!), documents_proxy
|
||||
- `backend/pyproject.toml` — moved httpx to main deps, added anthropic>=0.28, openai>=1.0
|
||||
- `frontend/src/App.tsx` — added /apps/documents and /apps/documents/settings/admin routes, removed /settings
|
||||
- `frontend/src/components/Nav.tsx` — removed Settings link, added Profile link, logo links to /
|
||||
- `frontend/src/pages/AppsPage.tsx` — replaced stub with app launcher card grid
|
||||
- `frontend/src/api/client.ts` — added documents, categories, and settings API functions
|
||||
- `docker-compose.yml` — added doc-service service, doc_data + app_config volumes, removed internal:true from backend-net, added app_config volume to backend
|
||||
- `docker-compose.dev.yml` — added doc-service dev override with --reload
|
||||
- `TODO.md` — added PDF Documents app section
|
||||
|
||||
## Files Deleted
|
||||
|
||||
- `frontend/src/pages/SettingsPage.tsx` — stub replaced by per-app settings pages
|
||||
Reference in New Issue
Block a user