Add PDF document service with AI extraction and per-app settings
- New `features/doc-service` FastAPI microservice: PDF upload, async text extraction (pdfplumber), AI classification via Anthropic/Ollama/ LM Studio, per-user categories, file download - Alembic migration isolated with `alembic_version_doc_service` table - Main backend: httpx proxy routers for /api/documents/* and /api/documents/categories/*, admin settings API at /api/settings/* - Runtime config in /config/doc_service_config.json (shared Docker volume); api_key masking on reads; atomic write with os.replace() - Frontend: DocumentsPage, DocumentAdminSettingsPage, updated AppsPage launcher hub, simplified Nav (removed Settings link), new routes - docker-compose: doc-service service, doc_data + app_config volumes, removed internal:true from backend-net for outbound AI API calls - Fix pre-commit hook: probe Docker socket path so git subprocess picks up Docker Desktop on macOS - Fix security_check.py: use sys.executable for bandit so venv python is used instead of system python Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,31 @@
|
||||
from abc import ABC, abstractmethod
|
||||
|
||||
SYSTEM_PROMPT = (
|
||||
"You are a financial document analysis assistant. "
|
||||
"Given the text extracted from a PDF document, return ONLY a JSON object "
|
||||
"with no markdown, no code fences, and no explanation."
|
||||
)
|
||||
|
||||
USER_PROMPT_TEMPLATE = """Analyze the following document text and return a JSON object with exactly these keys:
|
||||
document_type (one of: invoice, bill, receipt, order, expense, revenue, unknown),
|
||||
total_amount (string or null),
|
||||
currency (string or null),
|
||||
vendor_name (string or null),
|
||||
customer_name (string or null),
|
||||
billing_address (string or null),
|
||||
customer_address (string or null),
|
||||
invoice_number (string or null),
|
||||
invoice_date (string or null),
|
||||
due_date (string or null),
|
||||
tags (array of strings),
|
||||
line_items (array of objects, each with keys: description, amount).
|
||||
|
||||
Document text:
|
||||
{text}"""
|
||||
|
||||
|
||||
class AIProvider(ABC):
|
||||
@abstractmethod
|
||||
async def classify_document(self, text: str) -> dict:
|
||||
"""Return structured extraction dict from document text."""
|
||||
...
|
||||
Reference in New Issue
Block a user