Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
7.1 KiB
Phase 6: Performance & Production Hardening - Discussion Log
Audit trail only. Do not use as input to planning, research, or execution agents. Decisions are captured in CONTEXT.md — this log preserves the alternatives considered.
Date: 2026-05-30 Phase: 6-performance-production-hardening Areas discussed: Observability stack, Load testing & SLA targets, Container hardening depth, Rate limit header bypass prevention
Observability Stack
Structured Logging Library
| Option | Description | Selected |
|---|---|---|
| structlog | Purpose-built for structured logging; processors pipeline makes correlation IDs trivial; plays well with FastAPI middleware | ✓ |
| Standard logging + python-json-logger | Minimal change — configure stdlib root logger with a JSON formatter. Less powerful but zero new dependencies | |
| loguru | Simple API, good defaults, supports structured output via sink config |
User's choice: structlog Notes: No follow-up notes.
Log Aggregation
| Option | Description | Selected |
|---|---|---|
| Loki + Grafana in docker-compose | Matches success criteria literally. Adds 2 services; queries logs via Grafana UI at localhost | ✓ |
| stdout JSON only, no aggregation service | Simpler — just emit JSON to stdout, rely on docker compose logs |
|
| Promtail + Loki + Grafana full stack | Full Grafana stack with Promtail log shipper. More production-realistic but heavier |
User's choice: Loki + Grafana in docker-compose Notes: No follow-up notes.
Distributed Tracing
| Option | Description | Selected |
|---|---|---|
| Skip for now — correlation IDs in logs are enough | Simpler; stays in scope for v1 | ✓ |
| OpenTelemetry with Tempo (add to Grafana stack) | More complete observability but heavier setup | |
| OpenTelemetry spans to stdout only (no backend) | Lightweight but not queryable |
User's choice: Skip — correlation IDs in logs are enough Notes: No follow-up notes.
Load Testing & SLA Targets
Load Testing Tool
| Option | Description | Selected |
|---|---|---|
| Locust | Python-native, fits the existing stack. Test scenarios reuse auth helpers. Lives in backend/load_tests/ | ✓ |
| k6 | JavaScript-based, excellent HTML reports. Separate language from the rest of the stack | |
| pytest-benchmark + httpx | Minimal setup, reuses existing test infrastructure. Not realistic for concurrent load |
User's choice: Locust Notes: No follow-up notes.
Latency Targets
| Option | Description | Selected |
|---|---|---|
| Strict: p95 < 200ms, p99 < 500ms | Reasonable for a local Docker stack. Clear pass/fail criteria | ✓ |
| Relaxed: p95 < 500ms, p99 < 1s | More lenient — appropriate if cloud backend latency is included in scope | |
| You decide based on profiling | Run a baseline first, then set targets at 2x observed p95 |
User's choice: Strict — p95 < 200ms, p99 < 500ms Notes: No follow-up notes.
Load Test Endpoint Scope
| Option | Description | Selected |
|---|---|---|
| Auth + document list + document get + upload | Covers the critical read/write path. Excludes cloud backends | ✓ |
| Auth only | Focus on rate limiting under load. Misses the storage I/O path | |
| All endpoints including cloud proxy | Comprehensive but cloud latency makes p95 targets meaningless |
User's choice: Auth + document list/get/upload (no cloud backends) Notes: No follow-up notes.
Container Hardening Depth
Non-root User Setup
| Option | Description | Selected |
|---|---|---|
| Create appuser (uid 1000), chown /app, switch USER | Standard pattern. Works with read-only rootfs | |
| Multi-stage build: builder as root, runtime as appuser | Cleaner security boundary. pip install in builder, copy only packages to runtime. Reduces attack surface | ✓ |
| Distroless base image | Minimal image with no shell. Breaks pytesseract (needs system deps) |
User's choice: Multi-stage build with appuser Notes: No follow-up notes.
Read-only Filesystem
| Option | Description | Selected |
|---|---|---|
| tmpfs for /tmp + named volume for /app/data in docker-compose | read_only: true + tmpfs for temp files + named volume for data. Correct pattern |
✓ |
| tmpfs for /tmp only, data paths via env var | Simpler but less strict | |
| Skip read-only filesystem for Celery worker | Read-only only on FastAPI service; worker stays writable |
User's choice: tmpfs for /tmp + named volume for /app/data (full read-only rootfs on both services) Notes: No follow-up notes.
Linux Capability Dropping
| Option | Description | Selected |
|---|---|---|
| drop ALL capabilities, no cap_add | cap_drop: [ALL] with no cap_add. Port 8000 needs no capabilities |
✓ |
| drop ALL, add back CAP_NET_BIND_SERVICE | Only needed if binding to port 80/443 — unnecessary for port 8000 | |
| drop only dangerous caps (SYS_ADMIN, SYS_PTRACE, NET_RAW) | Less strict than CLAUDE.md mandate |
User's choice: drop ALL, no cap_add Notes: No follow-up notes.
Rate Limit Header Bypass Prevention
IP Extraction Strategy
| Option | Description | Selected |
|---|---|---|
| Custom key_func: trust X-Forwarded-For only from known proxy IPs | Replace get_remote_address with trusted-proxy check. Prevents header spoofing from external clients | ✓ |
| Never trust forwarded headers — always use request.client.host | Simplest and most secure for Docker Compose. Breaks if a proxy is added later | |
| Redis-backed rate limiter with per-account AND per-IP limits | More resilient for horizontal scaling but adds Redis dependency |
User's choice: Custom key_func with trusted-proxy CIDR check Notes: No follow-up notes.
Per-Account Rate Limiting
| Option | Description | Selected |
|---|---|---|
| Yes — add per-account limits on authenticated endpoints | Second limiter keyed by user_id on document/cloud endpoints (100 req/min per user) | ✓ |
| No — per-IP is sufficient for now | Document endpoints don't need additional per-user limits | |
| Per-account on auth endpoints only | Match Phase 2 intent exactly |
User's choice: Yes — per-account limits on authenticated document/cloud endpoints Notes: No follow-up notes.
Claude's Discretion
- Exact structlog processor chain configuration
- Loki Docker Compose service version and loki-config.yaml — use official Grafana example as base
- Promtail vs. Docker log driver for shipping to Loki
- Locust user class structure and task weight distribution
- Grafana dashboard panel layout (basic request rate + latency + error rate panels)
Deferred Ideas
- HTTPS/TLS termination (nginx + Let's Encrypt or Caddy) — out of scope; RUNBOOK.md documents how to add
- Horizontal scaling + Redis-backed rate limit counters — Phase 7+ concern
- GitHub Actions CI/CD pipeline for automated load tests and docker scout on every PR
- Automated backup cron job as a Docker service — RUNBOOK.md documents manual procedure