# Phase 6: Performance & Production Hardening - Discussion Log

> **Audit trail only.** Do not use as input to planning, research, or execution agents.
> Decisions are captured in CONTEXT.md — this log preserves the alternatives considered.

**Date:** 2026-05-30
**Phase:** 6-performance-production-hardening
**Areas discussed:** Observability stack, Load testing & SLA targets, Container hardening depth, Rate limit header bypass prevention

---

## Observability Stack

### Structured Logging Library

| Option | Description | Selected |
|--------|-------------|----------|
| structlog | Purpose-built for structured logging; processors pipeline makes correlation IDs trivial; plays well with FastAPI middleware | ✓ |
| Standard logging + python-json-logger | Minimal change — configure stdlib root logger with a JSON formatter. Less powerful but zero new dependencies | |
| loguru | Simple API, good defaults, supports structured output via sink config | |

**User's choice:** structlog
**Notes:** No follow-up notes.

---

### Log Aggregation

| Option | Description | Selected |
|--------|-------------|----------|
| Loki + Grafana in docker-compose | Matches success criteria literally. Adds 2 services; queries logs via Grafana UI at localhost | ✓ |
| stdout JSON only, no aggregation service | Simpler — just emit JSON to stdout, rely on `docker compose logs` | |
| Promtail + Loki + Grafana full stack | Full Grafana stack with Promtail log shipper. More production-realistic but heavier | |

**User's choice:** Loki + Grafana in docker-compose
**Notes:** No follow-up notes.

---

### Distributed Tracing

| Option | Description | Selected |
|--------|-------------|----------|
| Skip for now — correlation IDs in logs are enough | Simpler; stays in scope for v1 | ✓ |
| OpenTelemetry with Tempo (add to Grafana stack) | More complete observability but heavier setup | |
| OpenTelemetry spans to stdout only (no backend) | Lightweight but not queryable | |

**User's choice:** Skip — correlation IDs in logs are enough
**Notes:** No follow-up notes.

---

## Load Testing & SLA Targets

### Load Testing Tool

| Option | Description | Selected |
|--------|-------------|----------|
| Locust | Python-native, fits the existing stack. Test scenarios reuse auth helpers. Lives in backend/load_tests/ | ✓ |
| k6 | JavaScript-based, excellent HTML reports. Separate language from the rest of the stack | |
| pytest-benchmark + httpx | Minimal setup, reuses existing test infrastructure. Not realistic for concurrent load | |

**User's choice:** Locust
**Notes:** No follow-up notes.

---

### Latency Targets

| Option | Description | Selected |
|--------|-------------|----------|
| Strict: p95 < 200ms, p99 < 500ms | Reasonable for a local Docker stack. Clear pass/fail criteria | ✓ |
| Relaxed: p95 < 500ms, p99 < 1s | More lenient — appropriate if cloud backend latency is included in scope | |
| You decide based on profiling | Run a baseline first, then set targets at 2x observed p95 | |

**User's choice:** Strict — p95 < 200ms, p99 < 500ms
**Notes:** No follow-up notes.

---

### Load Test Endpoint Scope

| Option | Description | Selected |
|--------|-------------|----------|
| Auth + document list + document get + upload | Covers the critical read/write path. Excludes cloud backends | ✓ |
| Auth only | Focus on rate limiting under load. Misses the storage I/O path | |
| All endpoints including cloud proxy | Comprehensive but cloud latency makes p95 targets meaningless | |

**User's choice:** Auth + document list/get/upload (no cloud backends)
**Notes:** No follow-up notes.

---

## Container Hardening Depth

### Non-root User Setup

| Option | Description | Selected |
|--------|-------------|----------|
| Create appuser (uid 1000), chown /app, switch USER | Standard pattern. Works with read-only rootfs | |
| Multi-stage build: builder as root, runtime as appuser | Cleaner security boundary. pip install in builder, copy only packages to runtime. Reduces attack surface | ✓ |
| Distroless base image | Minimal image with no shell. Breaks pytesseract (needs system deps) | |

**User's choice:** Multi-stage build with appuser
**Notes:** No follow-up notes.

---

### Read-only Filesystem

| Option | Description | Selected |
|--------|-------------|----------|
| tmpfs for /tmp + named volume for /app/data in docker-compose | `read_only: true` + tmpfs for temp files + named volume for data. Correct pattern | ✓ |
| tmpfs for /tmp only, data paths via env var | Simpler but less strict | |
| Skip read-only filesystem for Celery worker | Read-only only on FastAPI service; worker stays writable | |

**User's choice:** tmpfs for /tmp + named volume for /app/data (full read-only rootfs on both services)
**Notes:** No follow-up notes.

---

### Linux Capability Dropping

| Option | Description | Selected |
|--------|-------------|----------|
| drop ALL capabilities, no cap_add | `cap_drop: [ALL]` with no cap_add. Port 8000 needs no capabilities | ✓ |
| drop ALL, add back CAP_NET_BIND_SERVICE | Only needed if binding to port 80/443 — unnecessary for port 8000 | |
| drop only dangerous caps (SYS_ADMIN, SYS_PTRACE, NET_RAW) | Less strict than CLAUDE.md mandate | |

**User's choice:** drop ALL, no cap_add
**Notes:** No follow-up notes.

---

## Rate Limit Header Bypass Prevention

### IP Extraction Strategy

| Option | Description | Selected |
|--------|-------------|----------|
| Custom key_func: trust X-Forwarded-For only from known proxy IPs | Replace get_remote_address with trusted-proxy check. Prevents header spoofing from external clients | ✓ |
| Never trust forwarded headers — always use request.client.host | Simplest and most secure for Docker Compose. Breaks if a proxy is added later | |
| Redis-backed rate limiter with per-account AND per-IP limits | More resilient for horizontal scaling but adds Redis dependency | |

**User's choice:** Custom key_func with trusted-proxy CIDR check
**Notes:** No follow-up notes.

---

### Per-Account Rate Limiting

| Option | Description | Selected |
|--------|-------------|----------|
| Yes — add per-account limits on authenticated endpoints | Second limiter keyed by user_id on document/cloud endpoints (100 req/min per user) | ✓ |
| No — per-IP is sufficient for now | Document endpoints don't need additional per-user limits | |
| Per-account on auth endpoints only | Match Phase 2 intent exactly | |

**User's choice:** Yes — per-account limits on authenticated document/cloud endpoints
**Notes:** No follow-up notes.

---

## Claude's Discretion

- Exact structlog processor chain configuration
- Loki Docker Compose service version and loki-config.yaml — use official Grafana example as base
- Promtail vs. Docker log driver for shipping to Loki
- Locust user class structure and task weight distribution
- Grafana dashboard panel layout (basic request rate + latency + error rate panels)

## Deferred Ideas

- HTTPS/TLS termination (nginx + Let's Encrypt or Caddy) — out of scope; RUNBOOK.md documents how to add
- Horizontal scaling + Redis-backed rate limit counters — Phase 7+ concern
- GitHub Actions CI/CD pipeline for automated load tests and docker scout on every PR
- Automated backup cron job as a Docker service — RUNBOOK.md documents manual procedure