Files
Business-Management/features/ai-service/STATUS.md
T
curo1305 1d01cc3b0e Add per-service system prompts with AI Settings tab view
Each feature service owns its system prompt in its config JSON on the
shared volume. The AI Settings page now has General and System Prompts
tabs — admins can view and edit any service's prompts at runtime with
changes taking effect within 30 s (config cache TTL).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 15:11:40 +02:00

119 lines
5.1 KiB
Markdown

# AI Service — Status
## What it is
Shared AI intermediary container. All feature containers (doc-service, future services) POST prompts here. It routes requests to the configured model (LM Studio / Ollama / Anthropic) and returns a normalised response. It is **stateless** — no database, no conversation history. History and context are the caller's responsibility.
Port: `8010` (internal only, not exposed to host).
---
## Current functionality
### Endpoints
| Method | Path | Description |
|--------|------|-------------|
| `POST` | `/chat` | Synchronous chat: submits at NORMAL priority, blocks until done |
| `GET` | `/health` | `{"status": "ok"}` |
| `GET` | `/health/provider` | Active provider name, model, configured flag |
| `POST` | `/queue/jobs` | Async enqueue — returns `job_id` immediately |
| `GET` | `/queue/jobs/{id}` | Poll job: status, position, result, error |
| `DELETE` | `/queue/jobs/{id}` | Cancel a pending job |
| `GET` | `/queue/status` | Worker state: running, paused, queue_size, current_job_id |
| `POST` | `/queue/pause` | Finish current job, stop picking new ones |
| `POST` | `/queue/resume` | Unpause |
| `POST` | `/queue/start` | Start (or restart) the worker task |
| `POST` | `/queue/stop` | Stop worker (pending jobs stay queued) |
### Priority queue
- Three levels: `high` (1) > `normal` (3) > `low` (5)
- FIFO within same priority level (monotonic sequence counter)
- Single async worker — one LLM call at a time
- Pause / resume / start / stop without restarting the container
- `POST /chat` is a synchronous wrapper: enqueues at NORMAL, awaits the future
### Providers
| Provider | Protocol | SDK |
|----------|----------|-----|
| LM Studio | OpenAI-compatible HTTP | openai |
| Ollama | OpenAI-compatible HTTP | openai |
| Anthropic | Anthropic API (HTTPS) | anthropic |
Active provider is selected by `"provider"` key in `/config/ai_service_config.json` (shared Docker volume), with env var overrides for dev.
### Configuration (env var overrides)
```
AI_PROVIDER lmstudio | ollama | anthropic
LMSTUDIO_BASE_URL http://host.docker.internal:1234/v1
LMSTUDIO_API_KEY sk-lm-…
LMSTUDIO_MODEL gemma-4-e4b-it ← current
OLLAMA_BASE_URL / OLLAMA_MODEL / OLLAMA_API_KEY
ANTHROPIC_API_KEY / ANTHROPIC_MODEL
```
Credentials live in `features/ai-service/.env` (gitignored).
### Error codes
| Code | Meaning |
|------|---------|
| 422 | Bad request (empty messages, unknown priority) |
| 502 | Provider connection / API error |
| 503 | Provider not configured / unknown provider |
| 504 | Provider timeout |
---
## Architecture
```
Callers (doc-service, future services)
└─▶ POST /chat (sync) ─┐
└─▶ POST /queue/jobs (async) ─┤
asyncio.PriorityQueue
(HIGH=1, NORMAL=3, LOW=5)
QueueWorker (single task)
execute_chat(request)
Provider SDK (openai / anthropic)
LM Studio / Ollama / Anthropic API
```
---
## Known limitations / not implemented
- **TLS to LM Studio** — communication is plain HTTP (`http://host.docker.internal:1234`). Deferred until LM Studio HTTPS configuration is confirmed. When ready: set `LMSTUDIO_BASE_URL=https://...` and optionally add `ssl_verify` + `ca_bundle` config keys to the OpenAI-compat provider.
- **True preemption** — a HIGH job arriving while a LOW job is processing will be next in queue but will not interrupt the running inference.
- **Queue persistence** — the in-memory queue is lost on container restart. Pending jobs are not persisted to disk.
- **Authentication on queue endpoints** — `/queue/*` management endpoints have no auth guard. Should be protected before any public/multi-tenant deployment (internal network is the only current protection).
- **Streaming responses** — `/chat` returns the full response after generation. Streaming (Server-Sent Events) not implemented.
- **Metrics / observability** — no Prometheus metrics, no structured request logging per job.
---
## System prompts
Each feature service (doc-service, future services) owns its own system prompt, stored in that service's config JSON on the shared volume. The backend settings API (`GET/PATCH /api/settings/system-prompts`) aggregates and edits them. The AI Service Settings UI exposes a **System Prompts** tab for editing all registered service prompts at runtime.
---
## Future work
- [ ] TLS support for LM Studio / Ollama (`ssl_verify`, `ca_bundle` config)
- [ ] Auth guard on queue management endpoints (admin token or internal-only route)
- [ ] Streaming responses via SSE (`POST /chat/stream`)
- [ ] Queue persistence (SQLite or Redis-backed) so jobs survive restarts
- [ ] Job result TTL / cleanup (currently jobs accumulate in `_jobs` dict indefinitely)
- [ ] Per-caller priority override (e.g. doc-service background jobs = LOW, user-triggered = NORMAL)
- [ ] Metrics endpoint (`/metrics`) for queue depth, job latency, provider error rate