Add PDF document service with AI extraction and per-app settings

- New `features/doc-service` FastAPI microservice: PDF upload, async
  text extraction (pdfplumber), AI classification via Anthropic/Ollama/
  LM Studio, per-user categories, file download
- Alembic migration isolated with `alembic_version_doc_service` table
- Main backend: httpx proxy routers for /api/documents/* and
  /api/documents/categories/*, admin settings API at /api/settings/*
- Runtime config in /config/doc_service_config.json (shared Docker
  volume); api_key masking on reads; atomic write with os.replace()
- Frontend: DocumentsPage, DocumentAdminSettingsPage, updated AppsPage
  launcher hub, simplified Nav (removed Settings link), new routes
- docker-compose: doc-service service, doc_data + app_config volumes,
  removed internal:true from backend-net for outbound AI API calls
- Fix pre-commit hook: probe Docker socket path so git subprocess picks
  up Docker Desktop on macOS
- Fix security_check.py: use sys.executable for bandit so venv python
  is used instead of system python

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
curo1305
2026-04-14 05:28:11 +02:00
parent d423bea134
commit 0d34867a69
52 changed files with 2500 additions and 28 deletions
+8
View File
@@ -4,6 +4,14 @@
REPO_ROOT="$(git rev-parse --show-toplevel)"
# Resolve Docker socket — the git hook environment may not inherit the active
# Docker context, so we probe common socket paths explicitly.
if [ -S "/Users/$USER/.docker/run/docker.sock" ]; then
export DOCKER_HOST="unix:///Users/$USER/.docker/run/docker.sock"
elif [ -S "/var/run/docker.sock" ]; then
export DOCKER_HOST="unix:///var/run/docker.sock"
fi
# Collect staged files on the host and pass them into the container as arguments
STAGED=$(git diff --cached --name-only --diff-filter=ACM)
+1
View File
@@ -17,3 +17,4 @@ frontend/dist/
# OS
.DS_Store
resume.txt
+12
View File
@@ -19,6 +19,18 @@
- [ ] **Permissions registry** — admin-managed table that controls which apps each user can access. Schema: `user_app_permissions (user_id FK, app_key)`. Admin UI lets the admin grant/revoke per-app access per user. The Apps page only shows apps the current user has been granted access to.
## PDF Documents app (`features/doc-service`)
- [x] **doc-service container** — FastAPI microservice on `backend-net`; never exposed to host or frontend directly
- [x] **PDF upload + async extraction** — background task with pdfplumber + pluggable AI (Anthropic / Ollama / LM Studio)
- [x] **Per-app settings page**`/apps/documents/settings/admin`; AI provider config, max file size; admin only
- [x] **Per-user categories** — create/rename/delete categories; assign multiple categories per document
- [x] **Alembic isolation**`alembic_version_doc_service` version table; no collision with main backend migrations
- [x] **Runtime config file**`/config/doc_service_config.json` on shared Docker volume; editable from frontend; 30s TTL cache in doc-service
- [ ] **Re-process document** — UI button to re-trigger AI extraction on an existing document (after changing AI provider/model)
- [ ] **Bulk category operations** — assign/remove a category from multiple documents at once
- [ ] **Search / filter documents** — filter by status, document type, category, date range
## Frontend features
- [x] **Logout button** — visible when logged in, clears token and redirects to `/login`
+117
View File
@@ -0,0 +1,117 @@
"""
Per-service runtime config helpers.
Config files live on the shared `app_config` Docker volume at /config/.
Each service has its own JSON file, e.g. /config/doc_service_config.json.
Atomic write pattern: write to .tmp in same dir, then os.replace() so
doc-service never reads a partial file.
"""
import json
import os
from pathlib import Path
from typing import Any
from pydantic import BaseModel
_CONFIG_DIR = Path(os.environ.get("APP_CONFIG_DIR", "/config"))
# ── Config schemas ─────────────────────────────────────────────────────────────
class AnthropicConfig(BaseModel):
api_key: str = ""
model: str = "claude-haiku-4-5-20251001"
class OllamaConfig(BaseModel):
base_url: str = "http://192.168.1.x:11434/v1"
model: str = "llama3.2"
api_key: str = "ollama"
class LMStudioConfig(BaseModel):
base_url: str = "http://192.168.1.x:1234/v1"
model: str = "local-model"
api_key: str = ""
class AIConfig(BaseModel):
provider: str = "anthropic"
anthropic: AnthropicConfig = AnthropicConfig()
ollama: OllamaConfig = OllamaConfig()
lmstudio: LMStudioConfig = LMStudioConfig()
class DocumentsConfig(BaseModel):
max_pdf_bytes: int = 20 * 1024 * 1024
class DocServiceConfig(BaseModel):
ai: AIConfig = AIConfig()
documents: DocumentsConfig = DocumentsConfig()
# ── Masking ────────────────────────────────────────────────────────────────────
def _mask_key(key: str) -> str:
if not key or len(key) <= 8:
return "••••"
return key[:7] + "••••"
def _mask_config(data: dict) -> dict:
"""Return a copy of data with api_key fields masked."""
import copy
masked = copy.deepcopy(data)
ai = masked.get("ai", {})
for provider in ("anthropic", "ollama", "lmstudio"):
if provider in ai and "api_key" in ai[provider]:
ai[provider]["api_key"] = _mask_key(ai[provider]["api_key"])
return masked
# ── Load / Save ────────────────────────────────────────────────────────────────
def _config_path(service: str) -> Path:
return _CONFIG_DIR / f"{service}_config.json"
def load_service_config(service: str) -> dict:
path = _config_path(service)
if not path.exists():
# Return default config if file doesn't exist yet
if service == "doc_service":
return DocServiceConfig().model_dump()
return {}
with path.open() as f:
return json.load(f)
def save_service_config(service: str, data: dict) -> None:
path = _config_path(service)
path.parent.mkdir(parents=True, exist_ok=True)
tmp = path.with_suffix(".tmp")
tmp.write_text(json.dumps(data, indent=2))
os.replace(tmp, path)
def load_doc_service_config() -> DocServiceConfig:
raw = load_service_config("doc_service")
return DocServiceConfig.model_validate(raw)
def save_doc_service_config(config: DocServiceConfig) -> None:
save_service_config("doc_service", config.model_dump())
def load_doc_service_config_masked() -> dict:
raw = load_service_config("doc_service")
return _mask_config(raw)
def _merge_api_key(new_key: str, existing_key: str) -> str:
"""If new_key is empty or a masked value, keep the existing key."""
if not new_key or "••••" in new_key:
return existing_key
return new_key
+11 -1
View File
@@ -2,7 +2,8 @@ from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from app.core.config import settings
from app.routers import admin, auth, profile, users
from app.routers import admin, auth, categories_proxy, documents_proxy, profile, users
from app.routers import settings as settings_router
app = FastAPI(title=settings.PROJECT_NAME, version="0.1.0")
@@ -18,6 +19,15 @@ app.include_router(auth.router, prefix="/api/auth", tags=["auth"])
app.include_router(users.router, prefix="/api/users", tags=["users"])
app.include_router(profile.router, prefix="/api/profile", tags=["profile"])
app.include_router(admin.router, prefix="/api/admin", tags=["admin"])
app.include_router(settings_router.router, prefix="/api/settings", tags=["settings"])
# categories_proxy MUST be registered before documents_proxy —
# otherwise /api/documents/{path:path} swallows /api/documents/categories/*
app.include_router(
categories_proxy.router,
prefix="/api/documents/categories",
tags=["categories"],
)
app.include_router(documents_proxy.router, prefix="/api/documents", tags=["documents"])
@app.get("/api/health")
+80
View File
@@ -0,0 +1,80 @@
"""
Proxy /api/documents/categories/* → doc-service:8001/categories/*.
Must be registered BEFORE the documents catch-all proxy in main.py,
otherwise /api/documents/{path:path} swallows category requests.
"""
import os
import httpx
from fastapi import APIRouter, Depends, HTTPException, Request
from fastapi.responses import StreamingResponse
from app.deps import get_current_user
from app.models.user import User
DOC_SERVICE_URL = os.environ.get("DOC_SERVICE_URL", "http://doc-service:8001")
_client = httpx.AsyncClient(base_url=DOC_SERVICE_URL, timeout=30.0)
router = APIRouter()
_HOP_BY_HOP = frozenset(
[
"connection",
"keep-alive",
"proxy-authenticate",
"proxy-authorization",
"te",
"trailers",
"transfer-encoding",
"upgrade",
"host",
]
)
def _forward_headers(request: Request, user_id: str) -> dict:
headers = {
k: v
for k, v in request.headers.items()
if k.lower() not in _HOP_BY_HOP
}
headers["x-user-id"] = user_id
return headers
@router.api_route("", methods=["GET", "POST"])
@router.api_route("/{path:path}", methods=["GET", "POST", "PUT", "PATCH", "DELETE"])
async def proxy_categories(
request: Request,
current_user: User = Depends(get_current_user),
path: str = "",
) -> StreamingResponse:
url = f"/categories/{path}" if path else "/categories"
headers = _forward_headers(request, str(current_user.id))
body = await request.body()
try:
response = await _client.request(
method=request.method,
url=url,
headers=headers,
content=body,
params=dict(request.query_params),
)
except httpx.RequestError as exc:
raise HTTPException(status_code=502, detail=f"doc-service unreachable: {exc}")
resp_headers = {
k: v
for k, v in response.headers.items()
if k.lower() not in _HOP_BY_HOP
}
return StreamingResponse(
content=iter([response.content]),
status_code=response.status_code,
headers=resp_headers,
media_type=response.headers.get("content-type"),
)
+84
View File
@@ -0,0 +1,84 @@
"""
Proxy all /api/documents/* requests to doc-service:8001/documents/*.
Uses a module-level AsyncClient for connection pooling.
Strips hop-by-hop headers that must not be forwarded.
File downloads (/file endpoint) are streamed.
"""
import os
import httpx
from fastapi import APIRouter, Depends, HTTPException, Request
from fastapi.responses import StreamingResponse
from app.deps import get_current_user
from app.models.user import User
DOC_SERVICE_URL = os.environ.get("DOC_SERVICE_URL", "http://doc-service:8001")
# Module-level client — reused across requests for connection pooling
_client = httpx.AsyncClient(base_url=DOC_SERVICE_URL, timeout=120.0)
router = APIRouter()
_HOP_BY_HOP = frozenset(
[
"connection",
"keep-alive",
"proxy-authenticate",
"proxy-authorization",
"te",
"trailers",
"transfer-encoding",
"upgrade",
"host",
]
)
def _forward_headers(request: Request, user_id: str) -> dict:
headers = {
k: v
for k, v in request.headers.items()
if k.lower() not in _HOP_BY_HOP
}
headers["x-user-id"] = user_id
return headers
@router.api_route("/{path:path}", methods=["GET", "POST", "PUT", "PATCH", "DELETE"])
async def proxy_documents(
path: str,
request: Request,
current_user: User = Depends(get_current_user),
) -> StreamingResponse:
url = f"/documents/{path}" if path else "/documents"
headers = _forward_headers(request, str(current_user.id))
# For multipart uploads, stream the body directly
body = await request.body()
try:
response = await _client.request(
method=request.method,
url=url,
headers=headers,
content=body,
params=dict(request.query_params),
)
except httpx.RequestError as exc:
raise HTTPException(status_code=502, detail=f"doc-service unreachable: {exc}")
# Strip hop-by-hop from response headers
resp_headers = {
k: v
for k, v in response.headers.items()
if k.lower() not in _HOP_BY_HOP
}
return StreamingResponse(
content=iter([response.content]),
status_code=response.status_code,
headers=resp_headers,
media_type=response.headers.get("content-type"),
)
+155
View File
@@ -0,0 +1,155 @@
"""
Admin-only settings API for per-service runtime configuration.
All endpoints require the caller to be an admin (Depends(get_current_admin)).
Config files live on the shared app_config volume (/config/).
"""
import asyncio
from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
from app.core.app_config import (
DocServiceConfig,
_merge_api_key,
load_doc_service_config,
load_doc_service_config_masked,
save_doc_service_config,
)
from app.deps import get_current_admin
from app.models.user import User
router = APIRouter()
# ── Pydantic request bodies ────────────────────────────────────────────────────
class AIProviderUpdate(BaseModel):
provider: str
anthropic_api_key: str = ""
anthropic_model: str = ""
ollama_base_url: str = ""
ollama_model: str = ""
ollama_api_key: str = ""
lmstudio_base_url: str = ""
lmstudio_model: str = ""
lmstudio_api_key: str = ""
class LimitsUpdate(BaseModel):
max_pdf_mb: int
# ── Documents settings ─────────────────────────────────────────────────────────
@router.get("/documents")
async def get_documents_settings(
_: User = Depends(get_current_admin),
) -> dict:
return load_doc_service_config_masked()
@router.patch("/documents/ai")
async def update_documents_ai(
body: AIProviderUpdate,
_: User = Depends(get_current_admin),
) -> dict:
valid_providers = ("anthropic", "ollama", "lmstudio")
if body.provider not in valid_providers:
raise HTTPException(status_code=422, detail=f"provider must be one of {valid_providers}")
config = load_doc_service_config()
config.ai.provider = body.provider
# Anthropic
if body.anthropic_api_key:
config.ai.anthropic.api_key = _merge_api_key(
body.anthropic_api_key, config.ai.anthropic.api_key
)
if body.anthropic_model:
config.ai.anthropic.model = body.anthropic_model
# Ollama
if body.ollama_base_url:
config.ai.ollama.base_url = body.ollama_base_url
if body.ollama_model:
config.ai.ollama.model = body.ollama_model
if body.ollama_api_key:
config.ai.ollama.api_key = _merge_api_key(body.ollama_api_key, config.ai.ollama.api_key)
# LM Studio
if body.lmstudio_base_url:
config.ai.lmstudio.base_url = body.lmstudio_base_url
if body.lmstudio_model:
config.ai.lmstudio.model = body.lmstudio_model
if body.lmstudio_api_key:
config.ai.lmstudio.api_key = _merge_api_key(
body.lmstudio_api_key, config.ai.lmstudio.api_key
)
await asyncio.to_thread(save_doc_service_config, config)
return load_doc_service_config_masked()
@router.post("/documents/ai/test")
async def test_documents_ai(
_: User = Depends(get_current_admin),
) -> dict:
"""Test the configured AI connection with a minimal prompt."""
from app.core.app_config import load_service_config
raw = await asyncio.to_thread(load_service_config, "doc_service")
ai_cfg = raw.get("ai", {})
provider_name = ai_cfg.get("provider", "anthropic")
try:
if provider_name == "anthropic":
import anthropic
client = anthropic.AsyncAnthropic(api_key=ai_cfg["anthropic"]["api_key"])
msg = await client.messages.create(
model=ai_cfg["anthropic"].get("model", "claude-haiku-4-5-20251001"),
max_tokens=16,
messages=[{"role": "user", "content": "Reply with: ok"}],
)
return {"ok": True, "provider": provider_name, "response": msg.content[0].text}
elif provider_name in ("ollama", "lmstudio"):
import openai
pcfg = ai_cfg[provider_name]
client = openai.AsyncOpenAI(
base_url=pcfg["base_url"],
api_key=pcfg.get("api_key") or "none",
)
resp = await client.chat.completions.create(
model=pcfg["model"],
messages=[{"role": "user", "content": "Reply with: ok"}],
max_tokens=16,
temperature=0,
)
return {
"ok": True,
"provider": provider_name,
"response": resp.choices[0].message.content,
}
else:
raise HTTPException(status_code=422, detail=f"Unknown provider: {provider_name}")
except Exception as exc:
return {"ok": False, "provider": provider_name, "error": str(exc)}
@router.patch("/documents/limits")
async def update_documents_limits(
body: LimitsUpdate,
_: User = Depends(get_current_admin),
) -> dict:
if body.max_pdf_mb < 1 or body.max_pdf_mb > 200:
raise HTTPException(status_code=422, detail="max_pdf_mb must be between 1 and 200")
config = load_doc_service_config()
config.documents.max_pdf_bytes = body.max_pdf_mb * 1024 * 1024
await asyncio.to_thread(save_doc_service_config, config)
return load_doc_service_config_masked()
+3 -1
View File
@@ -17,13 +17,15 @@ dependencies = [
"python-jose[cryptography]>=3.3",
"bcrypt>=4.0",
"python-multipart>=0.0.9",
"httpx>=0.27",
"anthropic>=0.28",
"openai>=1.0",
]
[project.optional-dependencies]
dev = [
"pytest>=8",
"pytest-asyncio>=0.23",
"httpx>=0.27",
"ruff>=0.4",
]
+58
View File
@@ -0,0 +1,58 @@
# 2026-04-14 — PDF Document Service
**Timestamp:** 2026-04-14T00:00:00+00:00
## Summary
Added `features/doc-service` — a FastAPI microservice that accepts PDF uploads, extracts text with pdfplumber, and uses a pluggable AI provider (Anthropic, Ollama, or LM Studio) to classify and extract structured data. Integrated it into the main backend via httpx proxy routers. Added an admin settings UI at `/apps/documents/settings/admin`. Updated the frontend route tree, Nav, and AppsPage.
## Files Added
- `features/doc-service/Dockerfile` — UID 1001, pre-creates `/data/documents` and `/config`
- `features/doc-service/pyproject.toml` — service dependencies
- `features/doc-service/alembic.ini` — separate `alembic_version_doc_service` table
- `features/doc-service/.env.example`
- `features/doc-service/scripts/start.sh` — migrations + uvicorn
- `features/doc-service/scripts/start_dev.sh` — migrations + uvicorn --reload
- `features/doc-service/alembic/env.py` — async migrations, VERSION_TABLE isolation
- `features/doc-service/alembic/versions/0001_create_doc_tables.py` — documents, document_categories, document_category_assignments
- `features/doc-service/app/main.py` — no CORS (internal service)
- `features/doc-service/app/core/config.py` — DATABASE_URL, DATA_DIR, CONFIG_PATH settings
- `features/doc-service/app/database.py` — async engine, AsyncSessionLocal, Base
- `features/doc-service/app/deps.py` — get_user_id from X-User-Id header
- `features/doc-service/app/models/document.py` — Document ORM model
- `features/doc-service/app/models/category.py` — DocumentCategory ORM model
- `features/doc-service/app/models/category_assignment.py` — CategoryAssignment composite PK
- `features/doc-service/app/models/__init__.py`
- `features/doc-service/app/schemas/document.py` — DocumentOut, DocumentStatusOut, DocumentTypeUpdate, CategoryOut
- `features/doc-service/app/schemas/category.py` — CategoryCreate, CategoryOut, CategoryUpdate
- `features/doc-service/app/routers/documents.py` — upload, list, get, status, patch type, delete, file download, category assignment
- `features/doc-service/app/routers/categories.py` — CRUD for DocumentCategory
- `features/doc-service/app/services/storage.py` — aiofiles write, path helpers, delete
- `features/doc-service/app/services/config_reader.py` — load_doc_config() with 30s TTL cache
- `features/doc-service/app/services/ai/__init__.py` — get_provider() factory
- `features/doc-service/app/services/ai/base.py` — AIProvider ABC, shared prompts
- `features/doc-service/app/services/ai/anthropic_provider.py` — AnthropicProvider
- `features/doc-service/app/services/ai/openai_compat.py` — OpenAICompatProvider (Ollama + LM Studio)
- `backend/app/core/app_config.py` — DocServiceConfig Pydantic model, load/save with atomic write, api_key masking
- `backend/app/routers/settings.py` — GET/PATCH /api/settings/documents/*, admin only
- `backend/app/routers/documents_proxy.py` — httpx proxy to doc-service /documents/*
- `backend/app/routers/categories_proxy.py` — httpx proxy to doc-service /categories/*
- `frontend/src/pages/DocumentsPage.tsx` — upload, list, status polling, categories, file download
- `frontend/src/pages/DocumentAdminSettingsPage.tsx` — AI provider config, connection test, upload limits
## Files Modified
- `backend/app/main.py` — registered settings_router, categories_proxy (before!), documents_proxy
- `backend/pyproject.toml` — moved httpx to main deps, added anthropic>=0.28, openai>=1.0
- `frontend/src/App.tsx` — added /apps/documents and /apps/documents/settings/admin routes, removed /settings
- `frontend/src/components/Nav.tsx` — removed Settings link, added Profile link, logo links to /
- `frontend/src/pages/AppsPage.tsx` — replaced stub with app launcher card grid
- `frontend/src/api/client.ts` — added documents, categories, and settings API functions
- `docker-compose.yml` — added doc-service service, doc_data + app_config volumes, removed internal:true from backend-net, added app_config volume to backend
- `docker-compose.dev.yml` — added doc-service dev override with --reload
- `TODO.md` — added PDF Documents app section
## Files Deleted
- `frontend/src/pages/SettingsPage.tsx` — stub replaced by per-app settings pages
+5
View File
@@ -22,3 +22,8 @@ services:
volumes:
- ./frontend:/app
- /app/node_modules
doc-service:
command: sh scripts/start_dev.sh
volumes:
- ./features/doc-service:/app
+28 -2
View File
@@ -30,6 +30,30 @@ services:
env_file: ./backend/.env
environment:
DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-postgres}:${POSTGRES_PASSWORD:-password}@db:5432/${POSTGRES_DB:-destroying_sap}
DOC_SERVICE_URL: http://doc-service:8001
volumes:
- app_config:/config
depends_on:
db:
condition: service_healthy
networks:
- backend-net
# ── Doc service (PDF extraction) ────────────────────────────────────────────
doc-service:
build:
context: ./features/doc-service
dockerfile: Dockerfile
network: host
user: "1001:1001"
restart: unless-stopped
environment:
DATABASE_URL: postgresql+asyncpg://${POSTGRES_USER:-postgres}:${POSTGRES_PASSWORD:-password}@db:5432/${POSTGRES_DB:-destroying_sap}
DATA_DIR: /data/documents
CONFIG_PATH: /config/doc_service_config.json
volumes:
- doc_data:/data/documents
- app_config:/config
depends_on:
db:
condition: service_healthy
@@ -54,10 +78,12 @@ services:
volumes:
postgres_data:
doc_data: # PDF files persisted across restarts
app_config: # Per-service runtime config JSON files
networks:
# Internal-only: db ↔ backend ↔ frontend reverse proxy. No host routing.
# backend-net: db ↔ backend ↔ doc-service. No host ports bound.
# internal:true removed — doc-service needs outbound access for cloud AI providers.
backend-net:
internal: true
# External-facing: only the frontend binds a host port through this network.
frontend-net:
+3
View File
@@ -0,0 +1,3 @@
DATABASE_URL=postgresql+asyncpg://postgres:password@db:5432/destroying_sap
DATA_DIR=/data/documents
CONFIG_PATH=/config/doc_service_config.json
+34
View File
@@ -0,0 +1,34 @@
# ── Stage 1: dependency installation ─────────────────────────────────────────
FROM python:3.12-slim AS builder
WORKDIR /app
RUN pip install --upgrade pip
COPY pyproject.toml .
RUN pip install --prefix=/install .
# ── Stage 2: runtime ──────────────────────────────────────────────────────────
FROM python:3.12-slim
# Create non-root user (UID/GID 1001)
RUN groupadd --gid 1001 appuser && \
useradd --uid 1001 --gid 1001 --no-create-home --shell /bin/sh appuser
# Pre-create data and config dirs with correct ownership.
# Named volumes mounted over these paths will inherit ownership on first creation.
RUN mkdir -p /data/documents /config && chown -R appuser:appuser /data /config
WORKDIR /app
COPY --from=builder /install /usr/local
COPY --chown=appuser:appuser app ./app
COPY --chown=appuser:appuser alembic ./alembic
COPY --chown=appuser:appuser alembic.ini .
COPY --chown=appuser:appuser scripts ./scripts
USER appuser
EXPOSE 8001
CMD ["sh", "scripts/start.sh"]
+45
View File
@@ -0,0 +1,45 @@
[alembic]
script_location = alembic
prepend_sys_path = .
version_path_separator = os
sqlalchemy.url = postgresql+asyncpg://postgres:password@localhost:5432/destroying_sap
# Use a separate version table so this service's migrations don't collide
# with the main backend's alembic_version table in the shared postgres instance.
version_table = alembic_version_doc_service
[post_write_hooks]
[loggers]
keys = root,sqlalchemy,alembic
[handlers]
keys = console
[formatters]
keys = generic
[logger_root]
level = WARN
handlers = console
qualname =
[logger_sqlalchemy]
level = WARN
handlers =
qualname = sqlalchemy.engine
[logger_alembic]
level = INFO
handlers =
qualname = alembic
[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic
[formatter_generic]
format = %(levelname)-5.5s [%(name)s] %(message)s
datefmt = %H:%M:%S
+55
View File
@@ -0,0 +1,55 @@
import asyncio
from logging.config import fileConfig
from alembic import context
from sqlalchemy.ext.asyncio import create_async_engine
from app.core.config import settings
from app.database import Base
import app.models # noqa: F401 — registers Document, DocumentCategory, CategoryAssignment
config = context.config
config.set_main_option("sqlalchemy.url", settings.DATABASE_URL)
if config.config_file_name:
fileConfig(config.config_file_name)
target_metadata = Base.metadata
# Separate version table — must not collide with the main backend's alembic_version table.
VERSION_TABLE = "alembic_version_doc_service"
def run_migrations_offline():
context.configure(
url=settings.DATABASE_URL,
target_metadata=target_metadata,
literal_binds=True,
dialect_opts={"paramstyle": "named"},
version_table=VERSION_TABLE,
)
with context.begin_transaction():
context.run_migrations()
def do_run_migrations(connection):
context.configure(
connection=connection,
target_metadata=target_metadata,
version_table=VERSION_TABLE,
)
with context.begin_transaction():
context.run_migrations()
async def run_migrations_online():
engine = create_async_engine(settings.DATABASE_URL)
async with engine.connect() as conn:
await conn.run_sync(do_run_migrations)
await engine.dispose()
if context.is_offline_mode():
run_migrations_offline()
else:
asyncio.run(run_migrations_online())
@@ -0,0 +1,25 @@
"""${message}
Revision ID: ${up_revision}
Revises: ${down_revision | comma,n}
Create Date: ${create_date}
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
${imports if imports else ""}
revision: str = ${repr(up_revision)}
down_revision: Union[str, None] = ${repr(down_revision)}
branch_labels: Union[str, Sequence[str], None] = ${repr(branch_labels)}
depends_on: Union[str, Sequence[str], None] = ${repr(depends_on)}
def upgrade() -> None:
${upgrades if upgrades else "pass"}
def downgrade() -> None:
${downgrades if downgrades else "pass"}
@@ -0,0 +1,79 @@
"""create document tables
Revision ID: 0001
Revises:
Create Date: 2026-04-14
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
revision: str = "0001"
down_revision: Union[str, None] = None
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.create_table(
"documents",
sa.Column("id", sa.String(), primary_key=True),
sa.Column("user_id", sa.String(), nullable=False),
sa.Column("filename", sa.String(), nullable=False),
sa.Column("file_path", sa.String(), nullable=False),
sa.Column("file_size", sa.Integer(), nullable=False),
sa.Column("status", sa.String(), nullable=False),
sa.Column("document_type", sa.String(), nullable=True),
sa.Column("raw_text", sa.Text(), nullable=True),
sa.Column("extracted_data", sa.Text(), nullable=True),
sa.Column("tags", sa.Text(), nullable=True),
sa.Column("error_message", sa.String(500), nullable=True),
sa.Column(
"created_at",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=False,
),
sa.Column("processed_at", sa.DateTime(timezone=True), nullable=True),
)
op.create_index("ix_documents_user_id", "documents", ["user_id"])
op.create_table(
"document_categories",
sa.Column("id", sa.String(), primary_key=True),
sa.Column("user_id", sa.String(), nullable=False),
sa.Column("name", sa.String(128), nullable=False),
sa.Column(
"created_at",
sa.DateTime(timezone=True),
server_default=sa.text("now()"),
nullable=False,
),
)
op.create_index("ix_document_categories_user_id", "document_categories", ["user_id"])
op.create_table(
"document_category_assignments",
sa.Column(
"document_id",
sa.String(),
sa.ForeignKey("documents.id", ondelete="CASCADE"),
primary_key=True,
),
sa.Column(
"category_id",
sa.String(),
sa.ForeignKey("document_categories.id", ondelete="CASCADE"),
primary_key=True,
),
)
def downgrade() -> None:
op.drop_table("document_category_assignments")
op.drop_index("ix_document_categories_user_id", "document_categories")
op.drop_table("document_categories")
op.drop_index("ix_documents_user_id", "documents")
op.drop_table("documents")
+14
View File
@@ -0,0 +1,14 @@
from pydantic_settings import BaseSettings
class Settings(BaseSettings):
PROJECT_NAME: str = "doc-service"
DATABASE_URL: str = "postgresql+asyncpg://postgres:password@db:5432/destroying_sap"
DATA_DIR: str = "/data/documents"
CONFIG_PATH: str = "/config/doc_service_config.json"
class Config:
env_file = ".env"
settings = Settings()
+16
View File
@@ -0,0 +1,16 @@
from sqlalchemy.ext.asyncio import AsyncSession, async_sessionmaker, create_async_engine
from sqlalchemy.orm import DeclarativeBase
from app.core.config import settings
engine = create_async_engine(settings.DATABASE_URL, echo=False)
AsyncSessionLocal = async_sessionmaker(engine, expire_on_commit=False)
class Base(DeclarativeBase):
pass
async def get_db() -> AsyncSession:
async with AsyncSessionLocal() as session:
yield session
+12
View File
@@ -0,0 +1,12 @@
from fastapi import Header, HTTPException
async def get_user_id(x_user_id: str = Header(...)) -> str:
"""
Extract the user identity injected by the main backend proxy.
The main backend validates the JWT and forwards the user ID via this header.
Doc-service trusts it because it is only reachable from backend on backend-net.
"""
if not x_user_id:
raise HTTPException(status_code=400, detail="Missing X-User-Id header")
return x_user_id
+17
View File
@@ -0,0 +1,17 @@
from fastapi import FastAPI
from app.core.config import settings
from app.routers import categories, documents
app = FastAPI(title=settings.PROJECT_NAME)
# No CORS — this service is only reachable from the main backend on backend-net.
# All browser traffic goes through the main backend proxy.
app.include_router(documents.router, prefix="/documents", tags=["documents"])
app.include_router(categories.router, prefix="/categories", tags=["categories"])
@app.get("/health")
def health():
return {"status": "ok"}
@@ -0,0 +1,5 @@
from app.models.document import Document
from app.models.category import DocumentCategory
from app.models.category_assignment import CategoryAssignment
__all__ = ["Document", "DocumentCategory", "CategoryAssignment"]
@@ -0,0 +1,22 @@
import uuid
from datetime import datetime
from sqlalchemy import DateTime, String, func
from sqlalchemy.orm import Mapped, mapped_column, relationship
from app.database import Base
class DocumentCategory(Base):
__tablename__ = "document_categories"
id: Mapped[str] = mapped_column(String, primary_key=True, default=lambda: str(uuid.uuid4()))
user_id: Mapped[str] = mapped_column(String, nullable=False, index=True)
name: Mapped[str] = mapped_column(String(128), nullable=False)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=func.now(), nullable=False
)
assignments: Mapped[list["CategoryAssignment"]] = relationship(
"CategoryAssignment", back_populates="category", cascade="all, delete-orphan"
)
@@ -0,0 +1,20 @@
from sqlalchemy import ForeignKey, String
from sqlalchemy.orm import Mapped, mapped_column, relationship
from app.database import Base
class CategoryAssignment(Base):
__tablename__ = "document_category_assignments"
document_id: Mapped[str] = mapped_column(
String, ForeignKey("documents.id", ondelete="CASCADE"), primary_key=True
)
category_id: Mapped[str] = mapped_column(
String, ForeignKey("document_categories.id", ondelete="CASCADE"), primary_key=True
)
document: Mapped["Document"] = relationship("Document", back_populates="category_assignments")
category: Mapped["DocumentCategory"] = relationship(
"DocumentCategory", back_populates="assignments"
)
@@ -0,0 +1,31 @@
import uuid
from datetime import datetime
from sqlalchemy import DateTime, Integer, String, Text, func
from sqlalchemy.orm import Mapped, mapped_column, relationship
from app.database import Base
class Document(Base):
__tablename__ = "documents"
id: Mapped[str] = mapped_column(String, primary_key=True, default=lambda: str(uuid.uuid4()))
user_id: Mapped[str] = mapped_column(String, nullable=False, index=True)
filename: Mapped[str] = mapped_column(String, nullable=False)
file_path: Mapped[str] = mapped_column(String, nullable=False)
file_size: Mapped[int] = mapped_column(Integer, nullable=False)
status: Mapped[str] = mapped_column(String, nullable=False, default="pending")
document_type: Mapped[str | None] = mapped_column(String, nullable=True)
raw_text: Mapped[str | None] = mapped_column(Text, nullable=True)
extracted_data: Mapped[str | None] = mapped_column(Text, nullable=True) # JSON string
tags: Mapped[str | None] = mapped_column(Text, nullable=True) # JSON array string
error_message: Mapped[str | None] = mapped_column(String(500), nullable=True)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=func.now(), nullable=False
)
processed_at: Mapped[datetime | None] = mapped_column(DateTime(timezone=True), nullable=True)
category_assignments: Mapped[list["CategoryAssignment"]] = relationship(
"CategoryAssignment", back_populates="document", cascade="all, delete-orphan"
)
@@ -0,0 +1,80 @@
from fastapi import APIRouter, Depends, HTTPException
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from app.database import get_db
from app.deps import get_user_id
from app.models.category import DocumentCategory
from app.schemas.category import CategoryCreate, CategoryOut, CategoryUpdate
router = APIRouter()
@router.get("", response_model=list[CategoryOut])
async def list_categories(
user_id: str = Depends(get_user_id),
db: AsyncSession = Depends(get_db),
) -> list[DocumentCategory]:
result = await db.execute(
select(DocumentCategory)
.where(DocumentCategory.user_id == user_id)
.order_by(DocumentCategory.name)
)
return result.scalars().all()
@router.post("", response_model=CategoryOut, status_code=201)
async def create_category(
body: CategoryCreate,
user_id: str = Depends(get_user_id),
db: AsyncSession = Depends(get_db),
) -> DocumentCategory:
name = body.name.strip()
if not name:
raise HTTPException(status_code=422, detail="Category name cannot be empty")
cat = DocumentCategory(user_id=user_id, name=name[:128])
db.add(cat)
await db.commit()
await db.refresh(cat)
return cat
@router.patch("/{cat_id}", response_model=CategoryOut)
async def rename_category(
cat_id: str,
body: CategoryUpdate,
user_id: str = Depends(get_user_id),
db: AsyncSession = Depends(get_db),
) -> DocumentCategory:
cat = await _get_user_cat(cat_id, user_id, db)
name = body.name.strip()
if not name:
raise HTTPException(status_code=422, detail="Category name cannot be empty")
cat.name = name[:128]
await db.commit()
await db.refresh(cat)
return cat
@router.delete("/{cat_id}", status_code=204)
async def delete_category(
cat_id: str,
user_id: str = Depends(get_user_id),
db: AsyncSession = Depends(get_db),
) -> None:
cat = await _get_user_cat(cat_id, user_id, db)
await db.delete(cat)
await db.commit()
async def _get_user_cat(cat_id: str, user_id: str, db: AsyncSession) -> DocumentCategory:
result = await db.execute(
select(DocumentCategory).where(
DocumentCategory.id == cat_id,
DocumentCategory.user_id == user_id,
)
)
cat = result.scalar_one_or_none()
if cat is None:
raise HTTPException(status_code=404, detail="Category not found")
return cat
@@ -0,0 +1,304 @@
import asyncio
import json
import uuid
from datetime import datetime, timezone
import aiofiles
import pdfplumber
from fastapi import APIRouter, BackgroundTasks, Depends, HTTPException, UploadFile
from fastapi.responses import StreamingResponse
from sqlalchemy import select
from sqlalchemy.ext.asyncio import AsyncSession
from sqlalchemy.orm import selectinload
from app.database import AsyncSessionLocal, get_db
from app.deps import get_user_id
from app.models.category import DocumentCategory
from app.models.category_assignment import CategoryAssignment
from app.models.document import Document
from app.schemas.document import DocumentOut, DocumentStatusOut, DocumentTypeUpdate
from app.services.ai import get_provider
from app.services.config_reader import load_doc_config
from app.services.storage import delete_file, get_upload_path, save_upload
router = APIRouter()
_DEFAULT_MAX_BYTES = 20 * 1024 * 1024
# ── Helpers ───────────────────────────────────────────────────────────────────
async def _get_user_doc(doc_id: str, user_id: str, db: AsyncSession) -> Document:
result = await db.execute(
select(Document)
.where(Document.id == doc_id, Document.user_id == user_id)
.options(
selectinload(Document.category_assignments)
.selectinload(CategoryAssignment.category)
)
)
doc = result.scalar_one_or_none()
if doc is None:
raise HTTPException(status_code=404, detail="Document not found")
return doc
def _doc_with_categories(doc: Document) -> DocumentOut:
from app.schemas.document import CategoryOut
cats = [CategoryOut(id=a.category.id, name=a.category.name) for a in doc.category_assignments]
return DocumentOut(
id=doc.id,
user_id=doc.user_id,
filename=doc.filename,
file_size=doc.file_size,
status=doc.status,
document_type=doc.document_type,
extracted_data=doc.extracted_data,
tags=doc.tags,
error_message=doc.error_message,
created_at=doc.created_at,
processed_at=doc.processed_at,
categories=cats,
)
def _extract_pdf_text(file_path: str) -> str:
"""Synchronous — must be called via asyncio.to_thread."""
text_parts = []
with pdfplumber.open(file_path) as pdf:
for page in pdf.pages:
page_text = page.extract_text()
if page_text:
text_parts.append(page_text)
return "\n".join(text_parts)
# ── Background processing ─────────────────────────────────────────────────────
async def process_document(doc_id: str) -> None:
"""
Runs after the upload response is sent.
Opens its own DB session — never use the request's Depends session here.
Loads AI config fresh from the config file so settings changes apply without restart.
"""
async with AsyncSessionLocal() as db:
doc = await db.get(Document, doc_id)
if doc is None:
return
doc.status = "processing"
await db.commit()
try:
text = await asyncio.to_thread(_extract_pdf_text, doc.file_path)
config = await load_doc_config()
provider = get_provider(config["ai"])
result = await provider.classify_document(text)
doc.raw_text = text[:500_000] # cap stored text at 500k chars
doc.extracted_data = json.dumps(result)
doc.document_type = result.get("document_type", "unknown")
doc.tags = json.dumps(result.get("tags", []))
doc.status = "done"
doc.processed_at = datetime.now(timezone.utc)
except Exception as exc:
doc.status = "failed"
doc.error_message = str(exc)[:500]
await db.commit()
# ── Routes ────────────────────────────────────────────────────────────────────
@router.post("/upload", response_model=DocumentOut, status_code=202)
async def upload_document(
file: UploadFile,
background_tasks: BackgroundTasks,
user_id: str = Depends(get_user_id),
db: AsyncSession = Depends(get_db),
) -> DocumentOut:
if file.content_type not in ("application/pdf", "application/octet-stream"):
if not (file.filename or "").lower().endswith(".pdf"):
raise HTTPException(status_code=415, detail="Only PDF files are accepted")
config = await load_doc_config()
max_bytes = config.get("documents", {}).get("max_pdf_bytes", _DEFAULT_MAX_BYTES)
file_data = await file.read()
if len(file_data) > max_bytes:
raise HTTPException(
status_code=413,
detail=f"File exceeds maximum size of {max_bytes // (1024*1024)} MB",
)
doc_id = str(uuid.uuid4())
dest = await save_upload(file_data, user_id, doc_id)
doc = Document(
id=doc_id,
user_id=user_id,
filename=file.filename or "upload.pdf",
file_path=str(dest),
file_size=len(file_data),
status="pending",
)
db.add(doc)
await db.commit()
await db.refresh(doc)
background_tasks.add_task(process_document, doc_id)
return _doc_with_categories(doc)
@router.get("", response_model=list[DocumentOut])
async def list_documents(
user_id: str = Depends(get_user_id),
db: AsyncSession = Depends(get_db),
) -> list[DocumentOut]:
result = await db.execute(
select(Document)
.where(Document.user_id == user_id)
.options(
selectinload(Document.category_assignments)
.selectinload(CategoryAssignment.category)
)
.order_by(Document.created_at.desc())
)
return [_doc_with_categories(d) for d in result.scalars().all()]
@router.get("/{doc_id}", response_model=DocumentOut)
async def get_document(
doc_id: str,
user_id: str = Depends(get_user_id),
db: AsyncSession = Depends(get_db),
) -> DocumentOut:
doc = await _get_user_doc(doc_id, user_id, db)
return _doc_with_categories(doc)
@router.get("/{doc_id}/status", response_model=DocumentStatusOut)
async def get_document_status(
doc_id: str,
user_id: str = Depends(get_user_id),
db: AsyncSession = Depends(get_db),
) -> Document:
result = await db.execute(
select(Document).where(Document.id == doc_id, Document.user_id == user_id)
)
doc = result.scalar_one_or_none()
if doc is None:
raise HTTPException(status_code=404, detail="Document not found")
return doc
@router.patch("/{doc_id}/type", response_model=DocumentOut)
async def update_document_type(
doc_id: str,
body: DocumentTypeUpdate,
user_id: str = Depends(get_user_id),
db: AsyncSession = Depends(get_db),
) -> DocumentOut:
doc = await _get_user_doc(doc_id, user_id, db)
doc.document_type = body.document_type
await db.commit()
await db.refresh(doc)
return _doc_with_categories(doc)
@router.delete("/{doc_id}", status_code=204)
async def delete_document(
doc_id: str,
user_id: str = Depends(get_user_id),
db: AsyncSession = Depends(get_db),
) -> None:
result = await db.execute(
select(Document).where(Document.id == doc_id, Document.user_id == user_id)
)
doc = result.scalar_one_or_none()
if doc is None:
raise HTTPException(status_code=404, detail="Document not found")
delete_file(doc.file_path)
await db.delete(doc)
await db.commit()
@router.get("/{doc_id}/file")
async def download_file(
doc_id: str,
user_id: str = Depends(get_user_id),
db: AsyncSession = Depends(get_db),
) -> StreamingResponse:
result = await db.execute(
select(Document).where(Document.id == doc_id, Document.user_id == user_id)
)
doc = result.scalar_one_or_none()
if doc is None:
raise HTTPException(status_code=404, detail="Document not found")
async def file_generator():
async with aiofiles.open(doc.file_path, "rb") as f:
while chunk := await f.read(64 * 1024):
yield chunk
return StreamingResponse(
file_generator(),
media_type="application/pdf",
headers={"Content-Disposition": f'inline; filename="{doc.filename}"'},
)
# ── Category assignment ───────────────────────────────────────────────────────
@router.post("/{doc_id}/categories/{cat_id}", status_code=204)
async def assign_category(
doc_id: str,
cat_id: str,
user_id: str = Depends(get_user_id),
db: AsyncSession = Depends(get_db),
) -> None:
# Verify both belong to this user
doc_result = await db.execute(
select(Document).where(Document.id == doc_id, Document.user_id == user_id)
)
if doc_result.scalar_one_or_none() is None:
raise HTTPException(status_code=404, detail="Document not found")
cat_result = await db.execute(
select(DocumentCategory).where(
DocumentCategory.id == cat_id, DocumentCategory.user_id == user_id
)
)
if cat_result.scalar_one_or_none() is None:
raise HTTPException(status_code=404, detail="Category not found")
# Upsert — ignore if already assigned
existing = await db.execute(
select(CategoryAssignment).where(
CategoryAssignment.document_id == doc_id,
CategoryAssignment.category_id == cat_id,
)
)
if existing.scalar_one_or_none() is None:
db.add(CategoryAssignment(document_id=doc_id, category_id=cat_id))
await db.commit()
@router.delete("/{doc_id}/categories/{cat_id}", status_code=204)
async def remove_category(
doc_id: str,
cat_id: str,
user_id: str = Depends(get_user_id),
db: AsyncSession = Depends(get_db),
) -> None:
result = await db.execute(
select(CategoryAssignment).where(
CategoryAssignment.document_id == doc_id,
CategoryAssignment.category_id == cat_id,
)
)
assignment = result.scalar_one_or_none()
if assignment:
await db.delete(assignment)
await db.commit()
@@ -0,0 +1,20 @@
from datetime import datetime
from pydantic import BaseModel
class CategoryOut(BaseModel):
id: str
user_id: str
name: str
created_at: datetime
model_config = {"from_attributes": True}
class CategoryCreate(BaseModel):
name: str
class CategoryUpdate(BaseModel):
name: str
@@ -0,0 +1,39 @@
from datetime import datetime
from pydantic import BaseModel
class CategoryOut(BaseModel):
id: str
name: str
model_config = {"from_attributes": True}
class DocumentOut(BaseModel):
id: str
user_id: str
filename: str
file_size: int
status: str
document_type: str | None
extracted_data: str | None # JSON string — frontend calls JSON.parse()
tags: str | None # JSON array string
error_message: str | None
created_at: datetime
processed_at: datetime | None
categories: list[CategoryOut] = []
model_config = {"from_attributes": True}
class DocumentStatusOut(BaseModel):
id: str
status: str
error_message: str | None
processed_at: datetime | None
model_config = {"from_attributes": True}
class DocumentTypeUpdate(BaseModel):
document_type: str
@@ -0,0 +1,23 @@
from app.services.ai.base import AIProvider
def get_provider(ai_config: dict) -> AIProvider:
"""
Factory: return an AIProvider instance based on the 'provider' key in the AI config section.
ai_config is the 'ai' section of doc_service_config.json, loaded fresh per processing job.
"""
provider_name = ai_config.get("provider", "anthropic")
provider_cfg = ai_config.get(provider_name, {})
match provider_name:
case "anthropic":
from app.services.ai.anthropic_provider import AnthropicProvider
return AnthropicProvider(provider_cfg)
case "ollama" | "lmstudio":
from app.services.ai.openai_compat import OpenAICompatProvider
return OpenAICompatProvider(provider_cfg)
case _:
raise ValueError(f"Unknown AI provider: {provider_name!r}")
__all__ = ["AIProvider", "get_provider"]
@@ -0,0 +1,31 @@
import json
from anthropic import AsyncAnthropic
from app.services.ai.base import AIProvider, SYSTEM_PROMPT, USER_PROMPT_TEMPLATE
class AnthropicProvider(AIProvider):
def __init__(self, config: dict) -> None:
self._client = AsyncAnthropic(api_key=config["api_key"])
self._model = config.get("model", "claude-haiku-4-5-20251001")
async def classify_document(self, text: str) -> dict:
message = await self._client.messages.create(
model=self._model,
max_tokens=2048,
system=SYSTEM_PROMPT,
messages=[{
"role": "user",
"content": USER_PROMPT_TEMPLATE.format(text=text[:100_000]),
}],
)
raw = message.content[0].text.strip()
return _parse_json(raw)
def _parse_json(raw: str) -> dict:
# Strip accidental markdown fences despite explicit instruction not to include them
if raw.startswith("```"):
raw = raw.split("\n", 1)[1].rsplit("```", 1)[0]
return json.loads(raw)
@@ -0,0 +1,31 @@
from abc import ABC, abstractmethod
SYSTEM_PROMPT = (
"You are a financial document analysis assistant. "
"Given the text extracted from a PDF document, return ONLY a JSON object "
"with no markdown, no code fences, and no explanation."
)
USER_PROMPT_TEMPLATE = """Analyze the following document text and return a JSON object with exactly these keys:
document_type (one of: invoice, bill, receipt, order, expense, revenue, unknown),
total_amount (string or null),
currency (string or null),
vendor_name (string or null),
customer_name (string or null),
billing_address (string or null),
customer_address (string or null),
invoice_number (string or null),
invoice_date (string or null),
due_date (string or null),
tags (array of strings),
line_items (array of objects, each with keys: description, amount).
Document text:
{text}"""
class AIProvider(ABC):
@abstractmethod
async def classify_document(self, text: str) -> dict:
"""Return structured extraction dict from document text."""
...
@@ -0,0 +1,36 @@
"""
OpenAI-compatible provider for Ollama and LM Studio.
Both expose an OpenAI-compatible /v1/chat/completions endpoint.
"""
import json
from openai import AsyncOpenAI
from app.services.ai.base import AIProvider, SYSTEM_PROMPT, USER_PROMPT_TEMPLATE
class OpenAICompatProvider(AIProvider):
def __init__(self, config: dict) -> None:
self._client = AsyncOpenAI(
base_url=config["base_url"],
api_key=config.get("api_key", "not-required"),
)
self._model = config["model"]
async def classify_document(self, text: str) -> dict:
response = await self._client.chat.completions.create(
model=self._model,
temperature=0,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": USER_PROMPT_TEMPLATE.format(text=text[:100_000])},
],
)
raw = response.choices[0].message.content.strip()
return _parse_json(raw)
def _parse_json(raw: str) -> dict:
if raw.startswith("```"):
raw = raw.split("\n", 1)[1].rsplit("```", 1)[0]
return json.loads(raw)
@@ -0,0 +1,44 @@
"""
Reads doc_service_config.json from the shared config volume.
Caches the result for 30 seconds to avoid hitting the filesystem on every request.
Uses asyncio.to_thread so the synchronous file read doesn't block the event loop.
"""
import asyncio
import json
import time
from pathlib import Path
from app.core.config import settings
_DEFAULT_CONFIG: dict = {
"ai": {
"provider": "anthropic",
"anthropic": {"api_key": "", "model": "claude-haiku-4-5-20251001"},
"ollama": {"base_url": "http://localhost:11434/v1", "model": "llama3.2", "api_key": "ollama"},
"lmstudio": {"base_url": "http://localhost:1234/v1", "model": "local-model", "api_key": ""},
},
"documents": {"max_pdf_bytes": 20 * 1024 * 1024},
}
_cache: dict | None = None
_cache_at: float = 0.0
_CACHE_TTL = 30.0
def _read_config_sync() -> dict:
path = Path(settings.CONFIG_PATH)
if not path.exists():
return _DEFAULT_CONFIG.copy()
with open(path) as f:
return json.load(f)
async def load_doc_config() -> dict:
global _cache, _cache_at
now = time.monotonic()
if _cache is not None and (now - _cache_at) < _CACHE_TTL:
return _cache
data = await asyncio.to_thread(_read_config_sync)
_cache = data
_cache_at = now
return data
@@ -0,0 +1,27 @@
import asyncio
from pathlib import Path
import aiofiles
from app.core.config import settings
def get_upload_path(user_id: str, doc_id: str) -> Path:
"""Return /data/documents/{user_id}/{doc_id}.pdf, creating the directory if needed."""
user_dir = Path(settings.DATA_DIR) / user_id
user_dir.mkdir(parents=True, exist_ok=True)
return user_dir / f"{doc_id}.pdf"
async def save_upload(file_data: bytes, user_id: str, doc_id: str) -> Path:
dest = get_upload_path(user_id, doc_id)
async with aiofiles.open(dest, "wb") as f:
await f.write(file_data)
return dest
def delete_file(file_path: str) -> None:
try:
Path(file_path).unlink(missing_ok=True)
except OSError:
pass # log but do not raise — deletion failure must not 500
+35
View File
@@ -0,0 +1,35 @@
[build-system]
requires = ["setuptools>=45"]
build-backend = "setuptools.build_meta"
[project]
name = "doc-service"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
"fastapi>=0.111",
"uvicorn[standard]>=0.29",
"sqlalchemy[asyncio]>=2.0",
"asyncpg>=0.29",
"alembic>=1.13",
"pydantic-settings>=2.2",
"anthropic>=0.28",
"openai>=1.0",
"pdfplumber>=0.11",
"aiofiles>=23.0",
"python-multipart>=0.0.9",
]
[project.optional-dependencies]
dev = [
"pytest>=8",
"pytest-asyncio>=0.23",
"httpx>=0.27",
"ruff>=0.4",
]
[tool.pytest.ini_options]
asyncio_mode = "auto"
[tool.ruff]
line-length = 100
+8
View File
@@ -0,0 +1,8 @@
#!/bin/sh
set -e
echo "[doc-service] running migrations..."
alembic upgrade head
echo "[doc-service] starting uvicorn..."
exec uvicorn app.main:app --host 0.0.0.0 --port 8001
@@ -0,0 +1,8 @@
#!/bin/sh
set -e
echo "[doc-service] running migrations..."
alembic upgrade head
echo "[doc-service] starting uvicorn (dev)..."
exec uvicorn app.main:app --host 0.0.0.0 --port 8001 --reload
+7 -2
View File
@@ -6,8 +6,9 @@ import LoginPage from "./pages/LoginPage";
import DashboardPage from "./pages/DashboardPage";
import ProfilePage from "./pages/ProfilePage";
import AppsPage from "./pages/AppsPage";
import SettingsPage from "./pages/SettingsPage";
import AdminPage from "./pages/AdminPage";
import DocumentsPage from "./pages/DocumentsPage";
import DocumentAdminSettingsPage from "./pages/DocumentAdminSettingsPage";
function PrivateRoute({ children }: { children: React.ReactNode }) {
const { token } = useAuth();
@@ -33,7 +34,11 @@ export default function App() {
<Route path="/" element={<PrivateRoute><DashboardPage /></PrivateRoute>} />
<Route path="/apps" element={<PrivateRoute><AppsPage /></PrivateRoute>} />
<Route path="/settings" element={<PrivateRoute><SettingsPage /></PrivateRoute>} />
<Route path="/apps/documents" element={<PrivateRoute><DocumentsPage /></PrivateRoute>} />
<Route
path="/apps/documents/settings/admin"
element={<AdminRoute><DocumentAdminSettingsPage /></AdminRoute>}
/>
<Route path="/profile" element={<PrivateRoute><ProfilePage /></PrivateRoute>} />
<Route path="/admin" element={<AdminRoute><AdminPage /></AdminRoute>} />
+110
View File
@@ -73,3 +73,113 @@ export const getProfile = () =>
export const updateProfile = (data: ProfileUpdate) =>
api.put<ProfileData>("/profile/me", data).then((r) => r.data);
// --- Documents ---
export type DocumentStatus = "pending" | "processing" | "done" | "failed";
export interface CategoryOut {
id: string;
name: string;
}
export interface DocumentOut {
id: string;
user_id: string;
filename: string;
file_size: number;
status: DocumentStatus;
document_type: string | null;
extracted_data: string | null;
tags: string | null;
error_message: string | null;
created_at: string;
processed_at: string | null;
categories: CategoryOut[];
}
export interface DocumentStatusOut {
id: string;
status: DocumentStatus;
document_type: string | null;
error_message: string | null;
processed_at: string | null;
}
export const listDocuments = () =>
api.get<DocumentOut[]>("/documents").then((r) => r.data);
export const getDocument = (id: string) =>
api.get<DocumentOut>(`/documents/${id}`).then((r) => r.data);
export const getDocumentStatus = (id: string) =>
api.get<DocumentStatusOut>(`/documents/${id}/status`).then((r) => r.data);
export const uploadDocument = (file: File) => {
const form = new FormData();
form.append("file", file);
return api.post<DocumentOut>("/documents/upload", form).then((r) => r.data);
};
export const updateDocumentType = (id: string, document_type: string) =>
api.patch<DocumentOut>(`/documents/${id}/type`, { document_type }).then((r) => r.data);
export const deleteDocument = (id: string) =>
api.delete(`/documents/${id}`);
export const downloadDocument = async (id: string, filename: string) => {
const response = await api.get(`/documents/${id}/file`, { responseType: "blob" });
const url = URL.createObjectURL(response.data);
const a = document.createElement("a");
a.href = url;
a.download = filename;
a.click();
URL.revokeObjectURL(url);
};
export const assignCategory = (docId: string, catId: string) =>
api.post(`/documents/${docId}/categories/${catId}`);
export const removeCategory = (docId: string, catId: string) =>
api.delete(`/documents/${docId}/categories/${catId}`);
// --- Categories ---
export const listCategories = () =>
api.get<CategoryOut[]>("/documents/categories").then((r) => r.data);
export const createCategory = (name: string) =>
api.post<CategoryOut>("/documents/categories", { name }).then((r) => r.data);
export const renameCategory = (id: string, name: string) =>
api.patch<CategoryOut>(`/documents/categories/${id}`, { name }).then((r) => r.data);
export const deleteCategory = (id: string) =>
api.delete(`/documents/categories/${id}`);
// --- Settings (admin only) ---
export interface AIProviderUpdate {
provider: string;
anthropic_api_key?: string;
anthropic_model?: string;
ollama_base_url?: string;
ollama_model?: string;
ollama_api_key?: string;
lmstudio_base_url?: string;
lmstudio_model?: string;
lmstudio_api_key?: string;
}
export const getDocumentSettings = () =>
api.get<Record<string, unknown>>("/settings/documents").then((r) => r.data);
export const updateDocumentAISettings = (data: AIProviderUpdate) =>
api.patch<Record<string, unknown>>("/settings/documents/ai", data).then((r) => r.data);
export const testDocumentAIConnection = () =>
api.post<{ ok: boolean; provider: string; response?: string; error?: string }>(
"/settings/documents/ai/test"
).then((r) => r.data);
export const updateDocumentLimits = (max_pdf_mb: number) =>
api.patch<Record<string, unknown>>("/settings/documents/limits", { max_pdf_mb }).then(
(r) => r.data
);
+3 -8
View File
@@ -15,16 +15,11 @@ export default function Nav() {
padding: "12px 24px",
borderBottom: "1px solid #ccc",
}}>
<Link to="/">Home</Link>
<Link to="/" style={{ fontWeight: "bold" }}>Home</Link>
<Link to="/apps">Apps</Link>
<Link to="/settings">Settings</Link>
{user?.is_admin && <Link to="/admin">Admin</Link>}
<button
onClick={logout}
style={{ marginLeft: "auto", cursor: "pointer" }}
>
Logout
</button>
<Link to="/profile" style={{ marginLeft: "auto" }}>Profile</Link>
<button onClick={logout} style={{ cursor: "pointer" }}>Logout</button>
</nav>
);
}
+85 -1
View File
@@ -1,11 +1,95 @@
import { Link } from "react-router-dom";
import { useQuery } from "@tanstack/react-query";
import Nav from "../components/Nav";
import { getMe } from "../api/client";
interface AppCard {
slug: string;
name: string;
description: string;
status: "available" | "coming_soon";
path: string;
settingsPath?: string;
}
const APPS: AppCard[] = [
{
slug: "documents",
name: "Documents",
description: "Upload PDF files, extract data, and organise them with categories.",
status: "available",
path: "/apps/documents",
settingsPath: "/apps/documents/settings/admin",
},
];
export default function AppsPage() {
const { data: user } = useQuery({ queryKey: ["me"], queryFn: getMe });
return (
<>
<Nav />
<div style={{ padding: 32 }}>
<div style={{ padding: 32, maxWidth: 900, margin: "0 auto" }}>
<h1>Apps</h1>
<div style={{ display: "flex", gap: 24, flexWrap: "wrap", marginTop: 24 }}>
{APPS.map((app) => (
<div
key={app.slug}
style={{
border: "1px solid #ddd",
borderRadius: 8,
padding: 24,
width: 280,
display: "flex",
flexDirection: "column",
gap: 12,
}}
>
<div style={{ display: "flex", justifyContent: "space-between", alignItems: "center" }}>
<h2 style={{ margin: 0, fontSize: 18 }}>{app.name}</h2>
{app.status === "available" ? (
<span style={{ fontSize: 12, color: "#2a9d8f", fontWeight: 600 }}>Available</span>
) : (
<span style={{ fontSize: 12, color: "#aaa" }}>Coming soon</span>
)}
</div>
<p style={{ margin: 0, color: "#555", fontSize: 14 }}>{app.description}</p>
<div style={{ display: "flex", gap: 8, marginTop: "auto" }}>
{app.status === "available" && (
<Link
to={app.path}
style={{
padding: "6px 14px",
background: "#222",
color: "#fff",
borderRadius: 4,
textDecoration: "none",
fontSize: 14,
}}
>
Open
</Link>
)}
{user?.is_admin && app.settingsPath && app.status === "available" && (
<Link
to={app.settingsPath}
style={{
padding: "6px 14px",
border: "1px solid #ccc",
borderRadius: 4,
textDecoration: "none",
fontSize: 14,
color: "#333",
}}
title="Settings"
>
Settings
</Link>
)}
</div>
</div>
))}
</div>
</div>
</>
);
@@ -0,0 +1,298 @@
import { useEffect, useState } from "react";
import { useQuery, useMutation } from "@tanstack/react-query";
import Nav from "../components/Nav";
import {
getDocumentSettings,
updateDocumentAISettings,
testDocumentAIConnection,
updateDocumentLimits,
} from "../api/client";
type Provider = "anthropic" | "ollama" | "lmstudio";
function Section({ title, children }: { title: string; children: React.ReactNode }) {
return (
<section style={{ marginBottom: 36 }}>
<h2 style={{ fontSize: 18, marginBottom: 16 }}>{title}</h2>
{children}
</section>
);
}
function Field({
label,
children,
}: {
label: string;
children: React.ReactNode;
}) {
return (
<div style={{ marginBottom: 12 }}>
<label style={{ display: "block", fontSize: 13, marginBottom: 4, color: "#555" }}>
{label}
</label>
{children}
</div>
);
}
const inputStyle: React.CSSProperties = {
width: "100%",
padding: "7px 10px",
fontSize: 14,
border: "1px solid #ccc",
borderRadius: 4,
boxSizing: "border-box",
};
export default function DocumentAdminSettingsPage() {
const { data: rawSettings, isLoading } = useQuery({
queryKey: ["docSettings"],
queryFn: getDocumentSettings,
});
const [provider, setProvider] = useState<Provider>("anthropic");
const [anthropicKey, setAnthropicKey] = useState("");
const [anthropicModel, setAnthropicModel] = useState("");
const [ollamaUrl, setOllamaUrl] = useState("");
const [ollamaModel, setOllamaModel] = useState("");
const [ollamaKey, setOllamaKey] = useState("");
const [lmstudioUrl, setLmstudioUrl] = useState("");
const [lmstudioModel, setLmstudioModel] = useState("");
const [lmstudioKey, setLmstudioKey] = useState("");
const [maxPdfMb, setMaxPdfMb] = useState(20);
const [testResult, setTestResult] = useState<{
ok: boolean;
response?: string;
error?: string;
} | null>(null);
// Populate form from loaded settings
useEffect(() => {
if (!rawSettings) return;
const s = rawSettings as Record<string, unknown>;
const ai = s.ai as Record<string, unknown> | undefined;
const docs = s.documents as Record<string, unknown> | undefined;
if (ai?.provider) setProvider(ai.provider as Provider);
const ant = ai?.anthropic as Record<string, string> | undefined;
if (ant?.api_key) setAnthropicKey(ant.api_key);
if (ant?.model) setAnthropicModel(ant.model);
const oll = ai?.ollama as Record<string, string> | undefined;
if (oll?.base_url) setOllamaUrl(oll.base_url);
if (oll?.model) setOllamaModel(oll.model);
if (oll?.api_key) setOllamaKey(oll.api_key);
const lms = ai?.lmstudio as Record<string, string> | undefined;
if (lms?.base_url) setLmstudioUrl(lms.base_url);
if (lms?.model) setLmstudioModel(lms.model);
if (lms?.api_key) setLmstudioKey(lms.api_key);
if (typeof docs?.max_pdf_bytes === "number") {
setMaxPdfMb(Math.round((docs.max_pdf_bytes as number) / (1024 * 1024)));
}
}, [rawSettings]);
const aiMut = useMutation({
mutationFn: updateDocumentAISettings,
});
const testMut = useMutation({
mutationFn: testDocumentAIConnection,
onSuccess: (data) => setTestResult(data),
});
const limitsMut = useMutation({
mutationFn: (mb: number) => updateDocumentLimits(mb),
});
const saveAI = () => {
aiMut.mutate({
provider,
anthropic_api_key: anthropicKey,
anthropic_model: anthropicModel,
ollama_base_url: ollamaUrl,
ollama_model: ollamaModel,
ollama_api_key: ollamaKey,
lmstudio_base_url: lmstudioUrl,
lmstudio_model: lmstudioModel,
lmstudio_api_key: lmstudioKey,
});
};
if (isLoading) {
return (
<>
<Nav />
<div style={{ padding: 32 }}>Loading</div>
</>
);
}
return (
<>
<Nav />
<div style={{ padding: 32, maxWidth: 600, margin: "0 auto" }}>
<h1 style={{ fontSize: 24, marginBottom: 32 }}>Documents Settings</h1>
<Section title="AI Provider">
<Field label="Provider">
<select
value={provider}
onChange={(e) => setProvider(e.target.value as Provider)}
style={inputStyle}
>
<option value="anthropic">Anthropic (cloud)</option>
<option value="ollama">Ollama (local)</option>
<option value="lmstudio">LM Studio (local)</option>
</select>
</Field>
{provider === "anthropic" && (
<>
<Field label="API Key">
<input
type="password"
value={anthropicKey}
onChange={(e) => setAnthropicKey(e.target.value)}
placeholder="sk-ant-… (leave blank to keep current)"
style={inputStyle}
/>
</Field>
<Field label="Model">
<input
value={anthropicModel}
onChange={(e) => setAnthropicModel(e.target.value)}
placeholder="claude-haiku-4-5-20251001"
style={inputStyle}
/>
</Field>
</>
)}
{provider === "ollama" && (
<>
<Field label="Base URL">
<input
value={ollamaUrl}
onChange={(e) => setOllamaUrl(e.target.value)}
placeholder="http://192.168.1.x:11434/v1"
style={inputStyle}
/>
</Field>
<Field label="Model">
<input
value={ollamaModel}
onChange={(e) => setOllamaModel(e.target.value)}
placeholder="llama3.2"
style={inputStyle}
/>
</Field>
<Field label="API Key (usually 'ollama')">
<input
value={ollamaKey}
onChange={(e) => setOllamaKey(e.target.value)}
placeholder="ollama"
style={inputStyle}
/>
</Field>
</>
)}
{provider === "lmstudio" && (
<>
<Field label="Base URL">
<input
value={lmstudioUrl}
onChange={(e) => setLmstudioUrl(e.target.value)}
placeholder="http://192.168.1.x:1234/v1"
style={inputStyle}
/>
</Field>
<Field label="Model">
<input
value={lmstudioModel}
onChange={(e) => setLmstudioModel(e.target.value)}
placeholder="local-model"
style={inputStyle}
/>
</Field>
<Field label="API Key (can be empty)">
<input
value={lmstudioKey}
onChange={(e) => setLmstudioKey(e.target.value)}
placeholder=""
style={inputStyle}
/>
</Field>
</>
)}
<div style={{ display: "flex", gap: 10, marginTop: 16 }}>
<button
onClick={saveAI}
disabled={aiMut.isPending}
style={{ padding: "8px 16px", cursor: "pointer", background: "#222", color: "#fff", borderRadius: 4, border: "none" }}
>
{aiMut.isPending ? "Saving…" : "Save"}
</button>
<button
onClick={() => testMut.mutate()}
disabled={testMut.isPending}
style={{ padding: "8px 16px", cursor: "pointer", borderRadius: 4, border: "1px solid #ccc" }}
>
{testMut.isPending ? "Testing…" : "Test Connection"}
</button>
</div>
{aiMut.isSuccess && (
<p style={{ marginTop: 8, fontSize: 13, color: "#2a9d8f" }}>Settings saved.</p>
)}
{aiMut.isError && (
<p style={{ marginTop: 8, fontSize: 13, color: "#c00" }}>Failed to save settings.</p>
)}
{testResult && (
<div
style={{
marginTop: 10,
padding: 10,
borderRadius: 4,
background: testResult.ok ? "#e8f5e9" : "#fdecea",
fontSize: 13,
}}
>
{testResult.ok ? (
<>Connected. Response: <em>{testResult.response}</em></>
) : (
<>Connection failed: {testResult.error}</>
)}
</div>
)}
</Section>
<Section title="Upload Limits">
<Field label="Max file size (MB)">
<input
type="number"
min={1}
max={200}
value={maxPdfMb}
onChange={(e) => setMaxPdfMb(Number(e.target.value))}
style={{ ...inputStyle, width: 120 }}
/>
</Field>
<button
onClick={() => limitsMut.mutate(maxPdfMb)}
disabled={limitsMut.isPending}
style={{ padding: "8px 16px", cursor: "pointer", background: "#222", color: "#fff", borderRadius: 4, border: "none", marginTop: 8 }}
>
{limitsMut.isPending ? "Saving…" : "Save"}
</button>
{limitsMut.isSuccess && (
<p style={{ marginTop: 8, fontSize: 13, color: "#2a9d8f" }}>Limits saved.</p>
)}
</Section>
</div>
</>
);
}
+370
View File
@@ -0,0 +1,370 @@
import { useRef, useState, useEffect } from "react";
import { useQuery, useMutation, useQueryClient } from "@tanstack/react-query";
import Nav from "../components/Nav";
import {
listDocuments,
uploadDocument,
deleteDocument,
downloadDocument,
getDocumentStatus,
listCategories,
createCategory,
assignCategory,
removeCategory,
type DocumentOut,
type CategoryOut,
} from "../api/client";
function StatusBadge({ status }: { status: DocumentOut["status"] }) {
const colors: Record<DocumentOut["status"], string> = {
pending: "#f4a261",
processing: "#2196f3",
done: "#2a9d8f",
failed: "#e63946",
};
return (
<span style={{
fontSize: 12,
fontWeight: 600,
padding: "2px 8px",
borderRadius: 4,
background: colors[status],
color: "#fff",
}}>
{status}
</span>
);
}
function DocumentRow({
doc,
categories,
onDelete,
}: {
doc: DocumentOut;
categories: CategoryOut[];
onDelete: (id: string) => void;
}) {
const [expanded, setExpanded] = useState(false);
const qc = useQueryClient();
// Poll status while pending/processing
const { data: liveStatus } = useQuery({
queryKey: ["docStatus", doc.id],
queryFn: () => getDocumentStatus(doc.id),
// v5: refetchInterval receives the Query object; data lives in query.state.data
refetchInterval: (query) => {
const s = query.state.data?.status;
return s === "pending" || s === "processing" ? 3000 : false;
},
enabled: doc.status === "pending" || doc.status === "processing",
});
useEffect(() => {
if (liveStatus?.status === "done" || liveStatus?.status === "failed") {
qc.invalidateQueries({ queryKey: ["documents"] });
}
}, [liveStatus?.status, qc]);
const assignMut = useMutation({
mutationFn: ({ catId }: { catId: string }) => assignCategory(doc.id, catId),
onSuccess: () => qc.invalidateQueries({ queryKey: ["documents"] }),
});
const removeCatMut = useMutation({
mutationFn: ({ catId }: { catId: string }) => removeCategory(doc.id, catId),
onSuccess: () => qc.invalidateQueries({ queryKey: ["documents"] }),
});
const assignedIds = new Set(doc.categories.map((c) => c.id));
const unassigned = categories.filter((c) => !assignedIds.has(c.id));
let extractedData: Record<string, unknown> | null = null;
if (doc.extracted_data) {
try {
extractedData = JSON.parse(doc.extracted_data);
} catch {
// ignore
}
}
const tags: string[] = [];
if (doc.tags) {
try {
const parsed = JSON.parse(doc.tags);
if (Array.isArray(parsed)) tags.push(...parsed);
} catch {
// ignore
}
}
return (
<div style={{ border: "1px solid #ddd", borderRadius: 6, marginBottom: 12 }}>
<div
style={{
display: "flex",
alignItems: "center",
gap: 12,
padding: "12px 16px",
cursor: "pointer",
}}
onClick={() => setExpanded((e) => !e)}
>
<span style={{ flex: 1, fontWeight: 500 }}>{doc.filename}</span>
<StatusBadge status={doc.status} />
{doc.document_type && (
<span style={{ fontSize: 12, color: "#555" }}>{doc.document_type}</span>
)}
<span style={{ fontSize: 12, color: "#999" }}>
{(doc.file_size / 1024).toFixed(0)} KB
</span>
<button
onClick={(e) => {
e.stopPropagation();
downloadDocument(doc.id, doc.filename);
}}
style={{ fontSize: 12, cursor: "pointer" }}
>
Download
</button>
<button
onClick={(e) => {
e.stopPropagation();
if (confirm(`Delete "${doc.filename}"?`)) onDelete(doc.id);
}}
style={{ fontSize: 12, color: "#c00", cursor: "pointer" }}
>
Delete
</button>
</div>
{expanded && (
<div style={{ padding: "0 16px 16px", borderTop: "1px solid #eee" }}>
{tags.length > 0 && (
<div style={{ marginTop: 10 }}>
<strong>Tags:</strong>{" "}
{tags.map((t) => (
<span
key={t}
style={{
fontSize: 12,
background: "#eee",
borderRadius: 3,
padding: "2px 6px",
marginRight: 4,
}}
>
{t}
</span>
))}
</div>
)}
{extractedData && (
<div style={{ marginTop: 10 }}>
<strong>Extracted data:</strong>
<table style={{ marginTop: 6, fontSize: 13, borderCollapse: "collapse" }}>
<tbody>
{Object.entries(extractedData)
.filter(([k]) => k !== "tags")
.map(([k, v]) => (
<tr key={k}>
<td style={{ paddingRight: 16, color: "#666", verticalAlign: "top" }}>{k}</td>
<td>
{Array.isArray(v)
? v.length === 0
? "—"
: JSON.stringify(v, null, 2)
: v !== null && v !== undefined && v !== ""
? String(v)
: "—"}
</td>
</tr>
))}
</tbody>
</table>
</div>
)}
{doc.error_message && (
<div style={{ marginTop: 10, color: "#c00", fontSize: 13 }}>
Error: {doc.error_message}
</div>
)}
<div style={{ marginTop: 12 }}>
<strong style={{ fontSize: 13 }}>Categories:</strong>{" "}
{doc.categories.map((c) => (
<span
key={c.id}
style={{
fontSize: 12,
background: "#dce8ff",
borderRadius: 3,
padding: "2px 6px",
marginRight: 4,
}}
>
{c.name}{" "}
<button
onClick={() => removeCatMut.mutate({ catId: c.id })}
style={{ fontSize: 10, cursor: "pointer", color: "#555", background: "none", border: "none" }}
>
x
</button>
</span>
))}
{unassigned.length > 0 && (
<select
defaultValue=""
onChange={(e) => {
if (e.target.value) assignMut.mutate({ catId: e.target.value });
e.target.value = "";
}}
style={{ fontSize: 12, marginLeft: 4 }}
>
<option value="">+ add category</option>
{unassigned.map((c) => (
<option key={c.id} value={c.id}>{c.name}</option>
))}
</select>
)}
</div>
</div>
)}
</div>
);
}
export default function DocumentsPage() {
const qc = useQueryClient();
const fileRef = useRef<HTMLInputElement>(null);
const [newCatName, setNewCatName] = useState("");
const [uploadError, setUploadError] = useState<string | null>(null);
const { data: documents = [], isLoading } = useQuery({
queryKey: ["documents"],
queryFn: listDocuments,
});
const { data: categories = [] } = useQuery({
queryKey: ["categories"],
queryFn: listCategories,
});
const uploadMut = useMutation({
mutationFn: uploadDocument,
onSuccess: () => {
setUploadError(null);
qc.invalidateQueries({ queryKey: ["documents"] });
},
onError: (err: unknown) => {
const msg =
(err as { response?: { data?: { detail?: string } } })?.response?.data?.detail ??
"Upload failed";
setUploadError(msg);
},
});
const deleteMut = useMutation({
mutationFn: deleteDocument,
onSuccess: () => qc.invalidateQueries({ queryKey: ["documents"] }),
});
const createCatMut = useMutation({
mutationFn: createCategory,
onSuccess: () => {
setNewCatName("");
qc.invalidateQueries({ queryKey: ["categories"] });
},
});
const handleFileChange = (e: React.ChangeEvent<HTMLInputElement>) => {
const file = e.target.files?.[0];
if (file) uploadMut.mutate(file);
e.target.value = "";
};
return (
<>
<Nav />
<div style={{ padding: 32, maxWidth: 900, margin: "0 auto" }}>
<h1>Documents</h1>
{/* Upload */}
<div style={{ marginBottom: 24 }}>
<input
ref={fileRef}
type="file"
accept="application/pdf"
style={{ display: "none" }}
onChange={handleFileChange}
/>
<button
onClick={() => fileRef.current?.click()}
disabled={uploadMut.isPending}
style={{ padding: "8px 16px", cursor: "pointer" }}
>
{uploadMut.isPending ? "Uploading…" : "Upload PDF"}
</button>
{uploadError && (
<span style={{ marginLeft: 12, color: "#c00", fontSize: 13 }}>{uploadError}</span>
)}
</div>
{/* Category management */}
<details style={{ marginBottom: 24 }}>
<summary style={{ cursor: "pointer", fontWeight: 500 }}>Manage categories</summary>
<div style={{ marginTop: 10, display: "flex", gap: 8, flexWrap: "wrap" }}>
{categories.map((c) => (
<span
key={c.id}
style={{
fontSize: 13,
background: "#eee",
borderRadius: 4,
padding: "4px 10px",
}}
>
{c.name}
</span>
))}
</div>
<form
style={{ marginTop: 10, display: "flex", gap: 8 }}
onSubmit={(e) => {
e.preventDefault();
if (newCatName.trim()) createCatMut.mutate(newCatName.trim());
}}
>
<input
value={newCatName}
onChange={(e) => setNewCatName(e.target.value)}
placeholder="New category name"
style={{ padding: "6px 10px", fontSize: 13 }}
/>
<button type="submit" disabled={createCatMut.isPending} style={{ cursor: "pointer" }}>
Add
</button>
</form>
</details>
{/* Document list */}
{isLoading ? (
<p>Loading</p>
) : documents.length === 0 ? (
<p style={{ color: "#666" }}>No documents yet. Upload a PDF to get started.</p>
) : (
documents.map((doc) => (
<DocumentRow
key={doc.id}
doc={doc}
categories={categories}
onDelete={(id) => deleteMut.mutate(id)}
/>
))
)}
</div>
</>
);
}
-12
View File
@@ -1,12 +0,0 @@
import Nav from "../components/Nav";
export default function SettingsPage() {
return (
<>
<Nav />
<div style={{ padding: 32 }}>
<h1>Settings</h1>
</div>
</>
);
}
+1 -1
View File
@@ -166,7 +166,7 @@ def run_bandit(py_files: list[Path]) -> tuple[bool, str]:
if not py_files:
return True, ""
result = subprocess.run(
["python", "-m", "bandit", "-q", "-ll", "--", *[str(f) for f in py_files]],
[sys.executable, "-m", "bandit", "-q", "-ll", "--", *[str(f) for f in py_files]],
capture_output=True, text=True
)
passed = result.returncode == 0