Files
kite/.planning/phases/06-performance-production-hardening/06-CONTEXT.md
T
2026-05-30 22:41:14 +02:00

10 KiB
Raw Blame History

Phase 6: Performance & Production Hardening - Context

Gathered: 2026-05-30 Status: Ready for planning

## Phase Boundary

The application is hardened and observable for production deployment. This phase delivers: structured JSON logging with correlation IDs and a Loki+Grafana aggregation stack in Docker Compose; a Locust load test suite with defined SLA targets (p95 < 200ms, p99 < 500ms) against the auth + document CRUD endpoints; container hardening via multi-stage Dockerfile with non-root appuser, read-only root filesystem with tmpfs mounts, and ALL Linux capabilities dropped; rate limit header-bypass prevention via a custom trusted-proxy IP extraction function; per-account rate limits on authenticated endpoints; and a RUNBOOK.md documenting all env vars, startup/shutdown, backup strategy, and on-call escalation.

No new user-facing features. All changes are operational and security hardening.

## Implementation Decisions

Observability — Structured Logging

  • D-01: Use structlog for structured JSON logging. Configure a processors pipeline that injects correlation IDs, user_id, request latency, and HTTP method/path into every log line. A FastAPI middleware generates a UUID correlation ID per request and binds it into the structlog context.
  • D-02: All services emit JSON to stdout. Loki + Grafana are added as services in docker-compose.yml (Loki as log storage, Grafana as query UI). Promtail or the Docker log driver ships logs from the backend container to Loki.
  • D-03: No distributed tracing (OpenTelemetry skipped). Correlation IDs in structured logs are sufficient for request tracing at this scale.

Load Testing

  • D-04: Use Locust for load testing. Test scenarios written in Python at backend/load_tests/locustfile.py. Locust can be run headless (locust --headless) or with its web UI (locust --host=http://localhost:8000).
  • D-05: Load test scope: login → list documents → get a document → upload a document. Simulates a realistic user session. Cloud backend endpoints excluded (external provider latency would invalidate local SLA targets).
  • D-06: SLA targets:
    • p50 < 100ms, p95 < 200ms, p99 < 500ms on all covered endpoints
    • Test parameters: 50 concurrent users, 5-minute soak (matches ROADMAP.md success criteria SC-01)
    • Load test passes when zero endpoint failures AND all p95/p99 targets met

Container Hardening

  • D-07: Multi-stage Dockerfile: builder stage installs Python dependencies and system packages as root; runtime stage copies only the installed packages and app code, creates appuser (uid 1000), and sets USER appuser. System deps (tesseract-ocr, libgl1, libglib2.0-0) installed in the runtime stage since they are runtime requirements.
  • D-08: Read-only root filesystem: Add read_only: true to the FastAPI and Celery worker services in docker-compose.yml. Add tmpfs: ["/tmp"] for temporary file operations (PyMuPDF temp files, Celery task temp downloads). The /app/data path is a named volume (bind mount) that remains writable for application data.
  • D-09: Dropped capabilities: cap_drop: [ALL] on both backend services. No cap_add — port 8000 is unprivileged and requires no capabilities.
  • D-10: docker scout CVE scan: Run docker scout cves on the built image as part of the security gate. Zero critical CVEs required before phase is marked complete.

Rate Limiting — Header Bypass Prevention

  • D-11: Replace get_remote_address (the default slowapi key function) with a custom get_client_ip(request) function. Logic:
    1. If request.client.host is in a trusted proxy CIDR (127.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, ::1), read the leftmost IP from X-Forwarded-For.
    2. Otherwise, use request.client.host directly — ignore all forwarded headers. This prevents header spoofing from external clients while preserving correct behavior when a legitimate reverse proxy is in front.
  • D-12: Add per-account rate limits on authenticated endpoints in addition to the existing per-IP limits. Use a second Limiter instance keyed by current_user.id (injected via a dependency). Target limits: 100 req/min per authenticated user on document/cloud endpoints; existing auth endpoint limits (10/min IP, 5/hour for password reset) remain unchanged.
  • D-13: Existing per-IP limits on auth endpoints (@limiter.limit("10/minute"), @limiter.limit("5/hour")) are preserved and strengthened only by switching to the trusted-proxy key function.

Runbook

  • D-14: RUNBOOK.md at repo root. Contents: all required env vars with descriptions and examples; Docker Compose startup/shutdown procedures; backup strategy for PostgreSQL (pg_dump cron) and MinIO (mc mirror); health check verification steps; on-call escalation path (who to contact, in what order, for which alert types); common failure modes and recovery steps.

Claude's Discretion

  • Exact structlog processor chain configuration (which fields, which order) — follow structlog documentation best practices.
  • Loki Docker Compose service version and configuration (loki-config.yaml) — use the official Grafana Loki Docker Compose example as the base.
  • Promtail vs. Docker log driver for shipping logs to Loki — Claude picks based on simplicity.
  • Locust user class structure and task weight distribution.
  • Specific Grafana dashboard panel layout — basic request rate + latency + error rate panels are sufficient.

<canonical_refs>

Canonical References

Downstream agents MUST read these before planning or implementing.

Phase Goal and Success Criteria

  • .planning/ROADMAP.md §"Phase 6: Performance & Production Hardening" — Goal, success criteria (SC-01 through SC-05), and phase gates. Requirements are TBD in ROADMAP.md but captured fully in this CONTEXT.md.

Security Mandates (Non-Negotiable)

  • CLAUDE.md §"Key Architectural Rules" — JWT memory-only, refresh httpOnly cookie, atomic quota UPDATE, admin endpoint restrictions.
  • CLAUDE.md §"Security Protocol" — Container hardening checklist (non-root, read-only fs, dropped caps, docker scout), bandit/pip audit/npm audit gates, no hardcoded secrets.
  • CLAUDE.md §"Security Requirements" — Rate limiting on all auth endpoints, constant-time comparison, CSRF protection.

Existing Rate Limiting Code

  • backend/api/auth.py lines 3744 — current Limiter(key_func=get_remote_address) setup; replace get_remote_address with custom trusted-proxy function.
  • backend/main.py lines 916, 108110 — SlowAPIMiddleware registration and limiter state attachment.

Container Configuration

  • backend/Dockerfile — current single-stage build running as root; must be replaced with multi-stage + appuser pattern.
  • docker-compose.yml — add read_only, tmpfs, cap_drop to backend service; add Loki + Grafana services.

Testing Infrastructure

  • backend/tests/conftest.py — existing async test fixtures and auth helpers; Locust scenarios should reuse the same auth patterns.
  • backend/pytest.ini — test runner config; load tests live separately in backend/load_tests/ and are NOT run by pytest -v.

</canonical_refs>

<code_context>

Existing Code Insights

Reusable Assets

  • backend/api/auth.py:3744Limiter + get_remote_address setup; extend to per-account limiter by adding a second Limiter(key_func=lambda req: str(current_user.id)) pattern.
  • backend/main.py:108110 — SlowAPIMiddleware already wired; adding a correlation ID middleware follows the same app.add_middleware() pattern.
  • backend/tests/conftest.py — auth fixtures (auth_client, admin_client) that Locust user classes can adapt to Python-based login flows.

Established Patterns

  • asyncio.to_thread() — all sync SDK calls already wrapped (MinIO, cloud backends); log emission is sync-safe so structlog integrates cleanly.
  • get_regular_user / get_current_admin dependency chain — per-account rate limiter should extract user_id from the same current_user object already injected by these deps.
  • Pydantic Settings (backend/config.py) — new env vars (trusted proxy CIDRs, structlog level, Loki endpoint) added via Settings class following the existing pattern.

Integration Points

  • backend/main.py — add correlation ID middleware, wire per-account limiter state.
  • docker-compose.yml — add Loki + Grafana services; add read_only: true, tmpfs, cap_drop to backend and celery-worker services.
  • backend/Dockerfile — replace with multi-stage build.
  • backend/api/auth.py — replace get_remote_address with custom get_client_ip.
  • backend/api/documents.py, backend/api/cloud.py — add per-account rate limit decorators.

</code_context>

## Specific Ideas
  • Loki stack: Use the official Grafana docker-compose example for Loki + Grafana (single-binary Loki mode is sufficient for local dev). Promtail or Docker log driver picks up container stdout.
  • Locust location: backend/load_tests/locustfile.py — separate directory from tests/ so pytest does not discover it. Run via locust --headless --users 50 --spawn-rate 10 --run-time 5m --host http://localhost:8000.
  • Correlation ID middleware: Generate str(uuid.uuid4()) per request, bind to structlog context via structlog.contextvars.bind_contextvars(correlation_id=...), include in response as X-Correlation-ID header.
  • RUNBOOK.md location: Repo root alongside CLAUDE.md and README.md.
## Deferred Ideas
  • HTTPS/TLS termination — adding nginx + Let's Encrypt or Caddy in front of the stack. Out of scope for Phase 6; the runbook documents how to add a reverse proxy.
  • Horizontal scaling — multiple uvicorn workers, Redis-backed rate limit counters, sticky sessions. Currently in-memory rate limits suffice for single-instance deployment. Phase 7+ concern.
  • CI/CD pipeline — GitHub Actions workflow for automated load tests and docker scout on every PR. Out of scope for Phase 6 (no CI setup exists yet).
  • Backup automation — automated pg_dump + MinIO mirror cron job as a Docker service. RUNBOOK.md documents the manual procedure; automation is a future operational phase.

Phase: 6-Performance & Production Hardening Context gathered: 2026-05-30