Files
Business-Management/features/ai-service/STATUS.md
T
curo1305 1d01cc3b0e Add per-service system prompts with AI Settings tab view
Each feature service owns its system prompt in its config JSON on the
shared volume. The AI Settings page now has General and System Prompts
tabs — admins can view and edit any service's prompts at runtime with
changes taking effect within 30 s (config cache TTL).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 15:11:40 +02:00

5.1 KiB

AI Service — Status

What it is

Shared AI intermediary container. All feature containers (doc-service, future services) POST prompts here. It routes requests to the configured model (LM Studio / Ollama / Anthropic) and returns a normalised response. It is stateless — no database, no conversation history. History and context are the caller's responsibility.

Port: 8010 (internal only, not exposed to host).


Current functionality

Endpoints

Method Path Description
POST /chat Synchronous chat: submits at NORMAL priority, blocks until done
GET /health {"status": "ok"}
GET /health/provider Active provider name, model, configured flag
POST /queue/jobs Async enqueue — returns job_id immediately
GET /queue/jobs/{id} Poll job: status, position, result, error
DELETE /queue/jobs/{id} Cancel a pending job
GET /queue/status Worker state: running, paused, queue_size, current_job_id
POST /queue/pause Finish current job, stop picking new ones
POST /queue/resume Unpause
POST /queue/start Start (or restart) the worker task
POST /queue/stop Stop worker (pending jobs stay queued)

Priority queue

  • Three levels: high (1) > normal (3) > low (5)
  • FIFO within same priority level (monotonic sequence counter)
  • Single async worker — one LLM call at a time
  • Pause / resume / start / stop without restarting the container
  • POST /chat is a synchronous wrapper: enqueues at NORMAL, awaits the future

Providers

Provider Protocol SDK
LM Studio OpenAI-compatible HTTP openai
Ollama OpenAI-compatible HTTP openai
Anthropic Anthropic API (HTTPS) anthropic

Active provider is selected by "provider" key in /config/ai_service_config.json (shared Docker volume), with env var overrides for dev.

Configuration (env var overrides)

AI_PROVIDER          lmstudio | ollama | anthropic
LMSTUDIO_BASE_URL    http://host.docker.internal:1234/v1
LMSTUDIO_API_KEY     sk-lm-…
LMSTUDIO_MODEL       gemma-4-e4b-it          ← current
OLLAMA_BASE_URL / OLLAMA_MODEL / OLLAMA_API_KEY
ANTHROPIC_API_KEY / ANTHROPIC_MODEL

Credentials live in features/ai-service/.env (gitignored).

Error codes

Code Meaning
422 Bad request (empty messages, unknown priority)
502 Provider connection / API error
503 Provider not configured / unknown provider
504 Provider timeout

Architecture

Callers (doc-service, future services)
    │
    └─▶ POST /chat (sync)         ─┐
    └─▶ POST /queue/jobs (async)  ─┤
                                   ▼
                        asyncio.PriorityQueue
                        (HIGH=1, NORMAL=3, LOW=5)
                                   │
                        QueueWorker (single task)
                                   │
                        execute_chat(request)
                                   │
                        Provider SDK (openai / anthropic)
                                   │
                        LM Studio / Ollama / Anthropic API

Known limitations / not implemented

  • TLS to LM Studio — communication is plain HTTP (http://host.docker.internal:1234). Deferred until LM Studio HTTPS configuration is confirmed. When ready: set LMSTUDIO_BASE_URL=https://... and optionally add ssl_verify + ca_bundle config keys to the OpenAI-compat provider.
  • True preemption — a HIGH job arriving while a LOW job is processing will be next in queue but will not interrupt the running inference.
  • Queue persistence — the in-memory queue is lost on container restart. Pending jobs are not persisted to disk.
  • Authentication on queue endpoints/queue/* management endpoints have no auth guard. Should be protected before any public/multi-tenant deployment (internal network is the only current protection).
  • Streaming responses/chat returns the full response after generation. Streaming (Server-Sent Events) not implemented.
  • Metrics / observability — no Prometheus metrics, no structured request logging per job.

System prompts

Each feature service (doc-service, future services) owns its own system prompt, stored in that service's config JSON on the shared volume. The backend settings API (GET/PATCH /api/settings/system-prompts) aggregates and edits them. The AI Service Settings UI exposes a System Prompts tab for editing all registered service prompts at runtime.


Future work

  • TLS support for LM Studio / Ollama (ssl_verify, ca_bundle config)
  • Auth guard on queue management endpoints (admin token or internal-only route)
  • Streaming responses via SSE (POST /chat/stream)
  • Queue persistence (SQLite or Redis-backed) so jobs survive restarts
  • Job result TTL / cleanup (currently jobs accumulate in _jobs dict indefinitely)
  • Per-caller priority override (e.g. doc-service background jobs = LOW, user-triggered = NORMAL)
  • Metrics endpoint (/metrics) for queue depth, job latency, provider error rate