Files

T

curo1305 1d01cc3b0e Add per-service system prompts with AI Settings tab view

Each feature service owns its system prompt in its config JSON on the
shared volume. The AI Settings page now has General and System Prompts
tabs — admins can view and edit any service's prompts at runtime with
changes taking effect within 30 s (config cache TTL).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-17 15:11:40 +02:00

5.1 KiB

Raw Blame History

AI Service — Status

What it is

Shared AI intermediary container. All feature containers (doc-service, future services) POST prompts here. It routes requests to the configured model (LM Studio / Ollama / Anthropic) and returns a normalised response. It is stateless — no database, no conversation history. History and context are the caller's responsibility.

Port: 8010 (internal only, not exposed to host).

Current functionality

Endpoints

Method	Path	Description
`POST`	`/chat`	Synchronous chat: submits at NORMAL priority, blocks until done
`GET`	`/health`	`{"status": "ok"}`
`GET`	`/health/provider`	Active provider name, model, configured flag
`POST`	`/queue/jobs`	Async enqueue — returns `job_id` immediately
`GET`	`/queue/jobs/{id}`	Poll job: status, position, result, error
`DELETE`	`/queue/jobs/{id}`	Cancel a pending job
`GET`	`/queue/status`	Worker state: running, paused, queue_size, current_job_id
`POST`	`/queue/pause`	Finish current job, stop picking new ones
`POST`	`/queue/resume`	Unpause
`POST`	`/queue/start`	Start (or restart) the worker task
`POST`	`/queue/stop`	Stop worker (pending jobs stay queued)

Priority queue

Three levels: high (1) > normal (3) > low (5)
FIFO within same priority level (monotonic sequence counter)
Single async worker — one LLM call at a time
Pause / resume / start / stop without restarting the container
POST /chat is a synchronous wrapper: enqueues at NORMAL, awaits the future

Providers

Provider	Protocol	SDK
LM Studio	OpenAI-compatible HTTP	openai
Ollama	OpenAI-compatible HTTP	openai
Anthropic	Anthropic API (HTTPS)	anthropic

Active provider is selected by "provider" key in /config/ai_service_config.json (shared Docker volume), with env var overrides for dev.

Configuration (env var overrides)

AI_PROVIDER          lmstudio | ollama | anthropic
LMSTUDIO_BASE_URL    http://host.docker.internal:1234/v1
LMSTUDIO_API_KEY     sk-lm-…
LMSTUDIO_MODEL       gemma-4-e4b-it          ← current
OLLAMA_BASE_URL / OLLAMA_MODEL / OLLAMA_API_KEY
ANTHROPIC_API_KEY / ANTHROPIC_MODEL

Credentials live in features/ai-service/.env (gitignored).

Error codes

Code	Meaning
422	Bad request (empty messages, unknown priority)
502	Provider connection / API error
503	Provider not configured / unknown provider
504	Provider timeout

Architecture

Callers (doc-service, future services)
    │
    └─▶ POST /chat (sync)         ─┐
    └─▶ POST /queue/jobs (async)  ─┤
                                   ▼
                        asyncio.PriorityQueue
                        (HIGH=1, NORMAL=3, LOW=5)
                                   │
                        QueueWorker (single task)
                                   │
                        execute_chat(request)
                                   │
                        Provider SDK (openai / anthropic)
                                   │
                        LM Studio / Ollama / Anthropic API

Known limitations / not implemented

TLS to LM Studio — communication is plain HTTP (http://host.docker.internal:1234). Deferred until LM Studio HTTPS configuration is confirmed. When ready: set LMSTUDIO_BASE_URL=https://... and optionally add ssl_verify + ca_bundle config keys to the OpenAI-compat provider.
True preemption — a HIGH job arriving while a LOW job is processing will be next in queue but will not interrupt the running inference.
Queue persistence — the in-memory queue is lost on container restart. Pending jobs are not persisted to disk.
Authentication on queue endpoints — /queue/* management endpoints have no auth guard. Should be protected before any public/multi-tenant deployment (internal network is the only current protection).
Streaming responses — /chat returns the full response after generation. Streaming (Server-Sent Events) not implemented.
Metrics / observability — no Prometheus metrics, no structured request logging per job.

System prompts

Each feature service (doc-service, future services) owns its own system prompt, stored in that service's config JSON on the shared volume. The backend settings API (GET/PATCH /api/settings/system-prompts) aggregates and edits them. The AI Service Settings UI exposes a System Prompts tab for editing all registered service prompts at runtime.

Future work

TLS support for LM Studio / Ollama (ssl_verify, ca_bundle config)
Auth guard on queue management endpoints (admin token or internal-only route)
Streaming responses via SSE (POST /chat/stream)
Queue persistence (SQLite or Redis-backed) so jobs survive restarts
Job result TTL / cleanup (currently jobs accumulate in _jobs dict indefinitely)
Per-caller priority override (e.g. doc-service background jobs = LOW, user-triggered = NORMAL)
Metrics endpoint (/metrics) for queue depth, job latency, provider error rate

5.1 KiB Raw Blame History