diff --git a/.planning/phases/06-performance-production-hardening/06-CONTEXT.md b/.planning/phases/06-performance-production-hardening/06-CONTEXT.md
new file mode 100644
index 0000000..649c1b5
--- /dev/null
+++ b/.planning/phases/06-performance-production-hardening/06-CONTEXT.md
@@ -0,0 +1,129 @@
+# Phase 6: Performance & Production Hardening - Context
+
+**Gathered:** 2026-05-30
+**Status:** Ready for planning
+
+<domain>
+## Phase Boundary
+
+The application is hardened and observable for production deployment. This phase delivers: structured JSON logging with correlation IDs and a Loki+Grafana aggregation stack in Docker Compose; a Locust load test suite with defined SLA targets (p95 < 200ms, p99 < 500ms) against the auth + document CRUD endpoints; container hardening via multi-stage Dockerfile with non-root appuser, read-only root filesystem with tmpfs mounts, and ALL Linux capabilities dropped; rate limit header-bypass prevention via a custom trusted-proxy IP extraction function; per-account rate limits on authenticated endpoints; and a RUNBOOK.md documenting all env vars, startup/shutdown, backup strategy, and on-call escalation.
+
+No new user-facing features. All changes are operational and security hardening.
+
+</domain>
+
+<decisions>
+## Implementation Decisions
+
+### Observability — Structured Logging
+- **D-01:** Use `structlog` for structured JSON logging. Configure a processors pipeline that injects correlation IDs, user_id, request latency, and HTTP method/path into every log line. A FastAPI middleware generates a UUID correlation ID per request and binds it into the structlog context.
+- **D-02:** All services emit JSON to stdout. Loki + Grafana are added as services in `docker-compose.yml` (Loki as log storage, Grafana as query UI). Promtail or the Docker log driver ships logs from the backend container to Loki.
+- **D-03:** No distributed tracing (OpenTelemetry skipped). Correlation IDs in structured logs are sufficient for request tracing at this scale.
+
+### Load Testing
+- **D-04:** Use **Locust** for load testing. Test scenarios written in Python at `backend/load_tests/locustfile.py`. Locust can be run headless (`locust --headless`) or with its web UI (`locust --host=http://localhost:8000`).
+- **D-05:** Load test scope: login → list documents → get a document → upload a document. Simulates a realistic user session. Cloud backend endpoints excluded (external provider latency would invalidate local SLA targets).
+- **D-06:** SLA targets:
+  - p50 < 100ms, p95 < 200ms, p99 < 500ms on all covered endpoints
+  - Test parameters: 50 concurrent users, 5-minute soak (matches ROADMAP.md success criteria SC-01)
+  - Load test passes when zero endpoint failures AND all p95/p99 targets met
+
+### Container Hardening
+- **D-07:** **Multi-stage Dockerfile**: `builder` stage installs Python dependencies and system packages as root; `runtime` stage copies only the installed packages and app code, creates `appuser` (uid 1000), and sets `USER appuser`. System deps (tesseract-ocr, libgl1, libglib2.0-0) installed in the runtime stage since they are runtime requirements.
+- **D-08:** **Read-only root filesystem**: Add `read_only: true` to the FastAPI and Celery worker services in `docker-compose.yml`. Add `tmpfs: ["/tmp"]` for temporary file operations (PyMuPDF temp files, Celery task temp downloads). The `/app/data` path is a named volume (bind mount) that remains writable for application data.
+- **D-09:** **Dropped capabilities**: `cap_drop: [ALL]` on both backend services. No `cap_add` — port 8000 is unprivileged and requires no capabilities.
+- **D-10:** **`docker scout` CVE scan**: Run `docker scout cves` on the built image as part of the security gate. Zero critical CVEs required before phase is marked complete.
+
+### Rate Limiting — Header Bypass Prevention
+- **D-11:** Replace `get_remote_address` (the default slowapi key function) with a custom `get_client_ip(request)` function. Logic:
+  1. If `request.client.host` is in a trusted proxy CIDR (127.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, ::1), read the leftmost IP from `X-Forwarded-For`.
+  2. Otherwise, use `request.client.host` directly — ignore all forwarded headers.
+  This prevents header spoofing from external clients while preserving correct behavior when a legitimate reverse proxy is in front.
+- **D-12:** Add **per-account rate limits** on authenticated endpoints in addition to the existing per-IP limits. Use a second `Limiter` instance keyed by `current_user.id` (injected via a dependency). Target limits: 100 req/min per authenticated user on document/cloud endpoints; existing auth endpoint limits (10/min IP, 5/hour for password reset) remain unchanged.
+- **D-13:** Existing per-IP limits on auth endpoints (`@limiter.limit("10/minute")`, `@limiter.limit("5/hour")`) are preserved and strengthened only by switching to the trusted-proxy key function.
+
+### Runbook
+- **D-14:** `RUNBOOK.md` at repo root. Contents: all required env vars with descriptions and examples; Docker Compose startup/shutdown procedures; backup strategy for PostgreSQL (pg_dump cron) and MinIO (mc mirror); health check verification steps; on-call escalation path (who to contact, in what order, for which alert types); common failure modes and recovery steps.
+
+### Claude's Discretion
+- Exact structlog processor chain configuration (which fields, which order) — follow structlog documentation best practices.
+- Loki Docker Compose service version and configuration (loki-config.yaml) — use the official Grafana Loki Docker Compose example as the base.
+- Promtail vs. Docker log driver for shipping logs to Loki — Claude picks based on simplicity.
+- Locust user class structure and task weight distribution.
+- Specific Grafana dashboard panel layout — basic request rate + latency + error rate panels are sufficient.
+
+</decisions>
+
+<canonical_refs>
+## Canonical References
+
+**Downstream agents MUST read these before planning or implementing.**
+
+### Phase Goal and Success Criteria
+- `.planning/ROADMAP.md` §"Phase 6: Performance & Production Hardening" — Goal, success criteria (SC-01 through SC-05), and phase gates. Requirements are TBD in ROADMAP.md but captured fully in this CONTEXT.md.
+
+### Security Mandates (Non-Negotiable)
+- `CLAUDE.md` §"Key Architectural Rules" — JWT memory-only, refresh httpOnly cookie, atomic quota UPDATE, admin endpoint restrictions.
+- `CLAUDE.md` §"Security Protocol" — Container hardening checklist (non-root, read-only fs, dropped caps, `docker scout`), bandit/pip audit/npm audit gates, no hardcoded secrets.
+- `CLAUDE.md` §"Security Requirements" — Rate limiting on all auth endpoints, constant-time comparison, CSRF protection.
+
+### Existing Rate Limiting Code
+- `backend/api/auth.py` lines 37–44 — current `Limiter(key_func=get_remote_address)` setup; replace `get_remote_address` with custom trusted-proxy function.
+- `backend/main.py` lines 9–16, 108–110 — SlowAPIMiddleware registration and limiter state attachment.
+
+### Container Configuration
+- `backend/Dockerfile` — current single-stage build running as root; must be replaced with multi-stage + appuser pattern.
+- `docker-compose.yml` — add `read_only`, `tmpfs`, `cap_drop` to backend service; add Loki + Grafana services.
+
+### Testing Infrastructure
+- `backend/tests/conftest.py` — existing async test fixtures and auth helpers; Locust scenarios should reuse the same auth patterns.
+- `backend/pytest.ini` — test runner config; load tests live separately in `backend/load_tests/` and are NOT run by `pytest -v`.
+
+</canonical_refs>
+
+<code_context>
+## Existing Code Insights
+
+### Reusable Assets
+- `backend/api/auth.py:37–44` — `Limiter` + `get_remote_address` setup; extend to per-account limiter by adding a second `Limiter(key_func=lambda req: str(current_user.id))` pattern.
+- `backend/main.py:108–110` — SlowAPIMiddleware already wired; adding a correlation ID middleware follows the same `app.add_middleware()` pattern.
+- `backend/tests/conftest.py` — auth fixtures (`auth_client`, `admin_client`) that Locust user classes can adapt to Python-based login flows.
+
+### Established Patterns
+- `asyncio.to_thread()` — all sync SDK calls already wrapped (MinIO, cloud backends); log emission is sync-safe so structlog integrates cleanly.
+- `get_regular_user` / `get_current_admin` dependency chain — per-account rate limiter should extract `user_id` from the same `current_user` object already injected by these deps.
+- Pydantic Settings (`backend/config.py`) — new env vars (trusted proxy CIDRs, structlog level, Loki endpoint) added via `Settings` class following the existing pattern.
+
+### Integration Points
+- `backend/main.py` — add correlation ID middleware, wire per-account limiter state.
+- `docker-compose.yml` — add Loki + Grafana services; add `read_only: true`, `tmpfs`, `cap_drop` to backend and celery-worker services.
+- `backend/Dockerfile` — replace with multi-stage build.
+- `backend/api/auth.py` — replace `get_remote_address` with custom `get_client_ip`.
+- `backend/api/documents.py`, `backend/api/cloud.py` — add per-account rate limit decorators.
+
+</code_context>
+
+<specifics>
+## Specific Ideas
+
+- **Loki stack**: Use the official Grafana `docker-compose` example for Loki + Grafana (single-binary Loki mode is sufficient for local dev). Promtail or Docker log driver picks up container stdout.
+- **Locust location**: `backend/load_tests/locustfile.py` — separate directory from `tests/` so `pytest` does not discover it. Run via `locust --headless --users 50 --spawn-rate 10 --run-time 5m --host http://localhost:8000`.
+- **Correlation ID middleware**: Generate `str(uuid.uuid4())` per request, bind to structlog context via `structlog.contextvars.bind_contextvars(correlation_id=...)`, include in response as `X-Correlation-ID` header.
+- **RUNBOOK.md location**: Repo root alongside CLAUDE.md and README.md.
+
+</specifics>
+
+<deferred>
+## Deferred Ideas
+
+- **HTTPS/TLS termination** — adding nginx + Let's Encrypt or Caddy in front of the stack. Out of scope for Phase 6; the runbook documents how to add a reverse proxy.
+- **Horizontal scaling** — multiple uvicorn workers, Redis-backed rate limit counters, sticky sessions. Currently in-memory rate limits suffice for single-instance deployment. Phase 7+ concern.
+- **CI/CD pipeline** — GitHub Actions workflow for automated load tests and `docker scout` on every PR. Out of scope for Phase 6 (no CI setup exists yet).
+- **Backup automation** — automated pg_dump + MinIO mirror cron job as a Docker service. RUNBOOK.md documents the manual procedure; automation is a future operational phase.
+
+</deferred>
+
+---
+
+*Phase: 6-Performance & Production Hardening*
+*Context gathered: 2026-05-30*
diff --git a/.planning/phases/06-performance-production-hardening/06-DISCUSSION-LOG.md b/.planning/phases/06-performance-production-hardening/06-DISCUSSION-LOG.md
new file mode 100644
index 0000000..1d258a2
--- /dev/null
+++ b/.planning/phases/06-performance-production-hardening/06-DISCUSSION-LOG.md
@@ -0,0 +1,176 @@
+# Phase 6: Performance & Production Hardening - Discussion Log
+
+> **Audit trail only.** Do not use as input to planning, research, or execution agents.
+> Decisions are captured in CONTEXT.md — this log preserves the alternatives considered.
+
+**Date:** 2026-05-30
+**Phase:** 6-performance-production-hardening
+**Areas discussed:** Observability stack, Load testing & SLA targets, Container hardening depth, Rate limit header bypass prevention
+
+---
+
+## Observability Stack
+
+### Structured Logging Library
+
+| Option | Description | Selected |
+|--------|-------------|----------|
+| structlog | Purpose-built for structured logging; processors pipeline makes correlation IDs trivial; plays well with FastAPI middleware | ✓ |
+| Standard logging + python-json-logger | Minimal change — configure stdlib root logger with a JSON formatter. Less powerful but zero new dependencies | |
+| loguru | Simple API, good defaults, supports structured output via sink config | |
+
+**User's choice:** structlog
+**Notes:** No follow-up notes.
+
+---
+
+### Log Aggregation
+
+| Option | Description | Selected |
+|--------|-------------|----------|
+| Loki + Grafana in docker-compose | Matches success criteria literally. Adds 2 services; queries logs via Grafana UI at localhost | ✓ |
+| stdout JSON only, no aggregation service | Simpler — just emit JSON to stdout, rely on `docker compose logs` | |
+| Promtail + Loki + Grafana full stack | Full Grafana stack with Promtail log shipper. More production-realistic but heavier | |
+
+**User's choice:** Loki + Grafana in docker-compose
+**Notes:** No follow-up notes.
+
+---
+
+### Distributed Tracing
+
+| Option | Description | Selected |
+|--------|-------------|----------|
+| Skip for now — correlation IDs in logs are enough | Simpler; stays in scope for v1 | ✓ |
+| OpenTelemetry with Tempo (add to Grafana stack) | More complete observability but heavier setup | |
+| OpenTelemetry spans to stdout only (no backend) | Lightweight but not queryable | |
+
+**User's choice:** Skip — correlation IDs in logs are enough
+**Notes:** No follow-up notes.
+
+---
+
+## Load Testing & SLA Targets
+
+### Load Testing Tool
+
+| Option | Description | Selected |
+|--------|-------------|----------|
+| Locust | Python-native, fits the existing stack. Test scenarios reuse auth helpers. Lives in backend/load_tests/ | ✓ |
+| k6 | JavaScript-based, excellent HTML reports. Separate language from the rest of the stack | |
+| pytest-benchmark + httpx | Minimal setup, reuses existing test infrastructure. Not realistic for concurrent load | |
+
+**User's choice:** Locust
+**Notes:** No follow-up notes.
+
+---
+
+### Latency Targets
+
+| Option | Description | Selected |
+|--------|-------------|----------|
+| Strict: p95 < 200ms, p99 < 500ms | Reasonable for a local Docker stack. Clear pass/fail criteria | ✓ |
+| Relaxed: p95 < 500ms, p99 < 1s | More lenient — appropriate if cloud backend latency is included in scope | |
+| You decide based on profiling | Run a baseline first, then set targets at 2x observed p95 | |
+
+**User's choice:** Strict — p95 < 200ms, p99 < 500ms
+**Notes:** No follow-up notes.
+
+---
+
+### Load Test Endpoint Scope
+
+| Option | Description | Selected |
+|--------|-------------|----------|
+| Auth + document list + document get + upload | Covers the critical read/write path. Excludes cloud backends | ✓ |
+| Auth only | Focus on rate limiting under load. Misses the storage I/O path | |
+| All endpoints including cloud proxy | Comprehensive but cloud latency makes p95 targets meaningless | |
+
+**User's choice:** Auth + document list/get/upload (no cloud backends)
+**Notes:** No follow-up notes.
+
+---
+
+## Container Hardening Depth
+
+### Non-root User Setup
+
+| Option | Description | Selected |
+|--------|-------------|----------|
+| Create appuser (uid 1000), chown /app, switch USER | Standard pattern. Works with read-only rootfs | |
+| Multi-stage build: builder as root, runtime as appuser | Cleaner security boundary. pip install in builder, copy only packages to runtime. Reduces attack surface | ✓ |
+| Distroless base image | Minimal image with no shell. Breaks pytesseract (needs system deps) | |
+
+**User's choice:** Multi-stage build with appuser
+**Notes:** No follow-up notes.
+
+---
+
+### Read-only Filesystem
+
+| Option | Description | Selected |
+|--------|-------------|----------|
+| tmpfs for /tmp + named volume for /app/data in docker-compose | `read_only: true` + tmpfs for temp files + named volume for data. Correct pattern | ✓ |
+| tmpfs for /tmp only, data paths via env var | Simpler but less strict | |
+| Skip read-only filesystem for Celery worker | Read-only only on FastAPI service; worker stays writable | |
+
+**User's choice:** tmpfs for /tmp + named volume for /app/data (full read-only rootfs on both services)
+**Notes:** No follow-up notes.
+
+---
+
+### Linux Capability Dropping
+
+| Option | Description | Selected |
+|--------|-------------|----------|
+| drop ALL capabilities, no cap_add | `cap_drop: [ALL]` with no cap_add. Port 8000 needs no capabilities | ✓ |
+| drop ALL, add back CAP_NET_BIND_SERVICE | Only needed if binding to port 80/443 — unnecessary for port 8000 | |
+| drop only dangerous caps (SYS_ADMIN, SYS_PTRACE, NET_RAW) | Less strict than CLAUDE.md mandate | |
+
+**User's choice:** drop ALL, no cap_add
+**Notes:** No follow-up notes.
+
+---
+
+## Rate Limit Header Bypass Prevention
+
+### IP Extraction Strategy
+
+| Option | Description | Selected |
+|--------|-------------|----------|
+| Custom key_func: trust X-Forwarded-For only from known proxy IPs | Replace get_remote_address with trusted-proxy check. Prevents header spoofing from external clients | ✓ |
+| Never trust forwarded headers — always use request.client.host | Simplest and most secure for Docker Compose. Breaks if a proxy is added later | |
+| Redis-backed rate limiter with per-account AND per-IP limits | More resilient for horizontal scaling but adds Redis dependency | |
+
+**User's choice:** Custom key_func with trusted-proxy CIDR check
+**Notes:** No follow-up notes.
+
+---
+
+### Per-Account Rate Limiting
+
+| Option | Description | Selected |
+|--------|-------------|----------|
+| Yes — add per-account limits on authenticated endpoints | Second limiter keyed by user_id on document/cloud endpoints (100 req/min per user) | ✓ |
+| No — per-IP is sufficient for now | Document endpoints don't need additional per-user limits | |
+| Per-account on auth endpoints only | Match Phase 2 intent exactly | |
+
+**User's choice:** Yes — per-account limits on authenticated document/cloud endpoints
+**Notes:** No follow-up notes.
+
+---
+
+## Claude's Discretion
+
+- Exact structlog processor chain configuration
+- Loki Docker Compose service version and loki-config.yaml — use official Grafana example as base
+- Promtail vs. Docker log driver for shipping to Loki
+- Locust user class structure and task weight distribution
+- Grafana dashboard panel layout (basic request rate + latency + error rate panels)
+
+## Deferred Ideas
+
+- HTTPS/TLS termination (nginx + Let's Encrypt or Caddy) — out of scope; RUNBOOK.md documents how to add
+- Horizontal scaling + Redis-backed rate limit counters — Phase 7+ concern
+- GitHub Actions CI/CD pipeline for automated load tests and docker scout on every PR
+- Automated backup cron job as a Docker service — RUNBOOK.md documents manual procedure