# Phase 6: Performance & Production Hardening - Discussion Log > **Audit trail only.** Do not use as input to planning, research, or execution agents. > Decisions are captured in CONTEXT.md — this log preserves the alternatives considered. **Date:** 2026-05-30 **Phase:** 6-performance-production-hardening **Areas discussed:** Observability stack, Load testing & SLA targets, Container hardening depth, Rate limit header bypass prevention --- ## Observability Stack ### Structured Logging Library | Option | Description | Selected | |--------|-------------|----------| | structlog | Purpose-built for structured logging; processors pipeline makes correlation IDs trivial; plays well with FastAPI middleware | ✓ | | Standard logging + python-json-logger | Minimal change — configure stdlib root logger with a JSON formatter. Less powerful but zero new dependencies | | | loguru | Simple API, good defaults, supports structured output via sink config | | **User's choice:** structlog **Notes:** No follow-up notes. --- ### Log Aggregation | Option | Description | Selected | |--------|-------------|----------| | Loki + Grafana in docker-compose | Matches success criteria literally. Adds 2 services; queries logs via Grafana UI at localhost | ✓ | | stdout JSON only, no aggregation service | Simpler — just emit JSON to stdout, rely on `docker compose logs` | | | Promtail + Loki + Grafana full stack | Full Grafana stack with Promtail log shipper. More production-realistic but heavier | | **User's choice:** Loki + Grafana in docker-compose **Notes:** No follow-up notes. --- ### Distributed Tracing | Option | Description | Selected | |--------|-------------|----------| | Skip for now — correlation IDs in logs are enough | Simpler; stays in scope for v1 | ✓ | | OpenTelemetry with Tempo (add to Grafana stack) | More complete observability but heavier setup | | | OpenTelemetry spans to stdout only (no backend) | Lightweight but not queryable | | **User's choice:** Skip — correlation IDs in logs are enough **Notes:** No follow-up notes. --- ## Load Testing & SLA Targets ### Load Testing Tool | Option | Description | Selected | |--------|-------------|----------| | Locust | Python-native, fits the existing stack. Test scenarios reuse auth helpers. Lives in backend/load_tests/ | ✓ | | k6 | JavaScript-based, excellent HTML reports. Separate language from the rest of the stack | | | pytest-benchmark + httpx | Minimal setup, reuses existing test infrastructure. Not realistic for concurrent load | | **User's choice:** Locust **Notes:** No follow-up notes. --- ### Latency Targets | Option | Description | Selected | |--------|-------------|----------| | Strict: p95 < 200ms, p99 < 500ms | Reasonable for a local Docker stack. Clear pass/fail criteria | ✓ | | Relaxed: p95 < 500ms, p99 < 1s | More lenient — appropriate if cloud backend latency is included in scope | | | You decide based on profiling | Run a baseline first, then set targets at 2x observed p95 | | **User's choice:** Strict — p95 < 200ms, p99 < 500ms **Notes:** No follow-up notes. --- ### Load Test Endpoint Scope | Option | Description | Selected | |--------|-------------|----------| | Auth + document list + document get + upload | Covers the critical read/write path. Excludes cloud backends | ✓ | | Auth only | Focus on rate limiting under load. Misses the storage I/O path | | | All endpoints including cloud proxy | Comprehensive but cloud latency makes p95 targets meaningless | | **User's choice:** Auth + document list/get/upload (no cloud backends) **Notes:** No follow-up notes. --- ## Container Hardening Depth ### Non-root User Setup | Option | Description | Selected | |--------|-------------|----------| | Create appuser (uid 1000), chown /app, switch USER | Standard pattern. Works with read-only rootfs | | | Multi-stage build: builder as root, runtime as appuser | Cleaner security boundary. pip install in builder, copy only packages to runtime. Reduces attack surface | ✓ | | Distroless base image | Minimal image with no shell. Breaks pytesseract (needs system deps) | | **User's choice:** Multi-stage build with appuser **Notes:** No follow-up notes. --- ### Read-only Filesystem | Option | Description | Selected | |--------|-------------|----------| | tmpfs for /tmp + named volume for /app/data in docker-compose | `read_only: true` + tmpfs for temp files + named volume for data. Correct pattern | ✓ | | tmpfs for /tmp only, data paths via env var | Simpler but less strict | | | Skip read-only filesystem for Celery worker | Read-only only on FastAPI service; worker stays writable | | **User's choice:** tmpfs for /tmp + named volume for /app/data (full read-only rootfs on both services) **Notes:** No follow-up notes. --- ### Linux Capability Dropping | Option | Description | Selected | |--------|-------------|----------| | drop ALL capabilities, no cap_add | `cap_drop: [ALL]` with no cap_add. Port 8000 needs no capabilities | ✓ | | drop ALL, add back CAP_NET_BIND_SERVICE | Only needed if binding to port 80/443 — unnecessary for port 8000 | | | drop only dangerous caps (SYS_ADMIN, SYS_PTRACE, NET_RAW) | Less strict than CLAUDE.md mandate | | **User's choice:** drop ALL, no cap_add **Notes:** No follow-up notes. --- ## Rate Limit Header Bypass Prevention ### IP Extraction Strategy | Option | Description | Selected | |--------|-------------|----------| | Custom key_func: trust X-Forwarded-For only from known proxy IPs | Replace get_remote_address with trusted-proxy check. Prevents header spoofing from external clients | ✓ | | Never trust forwarded headers — always use request.client.host | Simplest and most secure for Docker Compose. Breaks if a proxy is added later | | | Redis-backed rate limiter with per-account AND per-IP limits | More resilient for horizontal scaling but adds Redis dependency | | **User's choice:** Custom key_func with trusted-proxy CIDR check **Notes:** No follow-up notes. --- ### Per-Account Rate Limiting | Option | Description | Selected | |--------|-------------|----------| | Yes — add per-account limits on authenticated endpoints | Second limiter keyed by user_id on document/cloud endpoints (100 req/min per user) | ✓ | | No — per-IP is sufficient for now | Document endpoints don't need additional per-user limits | | | Per-account on auth endpoints only | Match Phase 2 intent exactly | | **User's choice:** Yes — per-account limits on authenticated document/cloud endpoints **Notes:** No follow-up notes. --- ## Claude's Discretion - Exact structlog processor chain configuration - Loki Docker Compose service version and loki-config.yaml — use official Grafana example as base - Promtail vs. Docker log driver for shipping to Loki - Locust user class structure and task weight distribution - Grafana dashboard panel layout (basic request rate + latency + error rate panels) ## Deferred Ideas - HTTPS/TLS termination (nginx + Let's Encrypt or Caddy) — out of scope; RUNBOOK.md documents how to add - Horizontal scaling + Redis-backed rate limit counters — Phase 7+ concern - GitHub Actions CI/CD pipeline for automated load tests and docker scout on every PR - Automated backup cron job as a Docker service — RUNBOOK.md documents manual procedure