89f8d5a654
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
26 KiB
26 KiB
Codebase Concerns
Analysis Date: 2026-06-02
Security Concerns
JWT Algorithm Downgrade: HS256 Instead of ES256
- Risk: CLAUDE.md specifies ES256 (asymmetric ECDSA P-256) as the required algorithm, but the implementation uses HS256 (symmetric HMAC-SHA256).
- Files:
backend/services/auth.pylines 99, 109, 132, 141 - Impact: A leaked
SECRET_KEYallows arbitrary token forgery. With HS256 any party that has the secret can forge access tokens, impersonate admin users, and bypass all auth checks. ES256 would require the private key for forgery while the public key could safely be distributed for verification. - Fix approach: Generate an ECDSA P-256 key pair, store the private key in an env var (
JWT_PRIVATE_KEY), store the public key asJWT_PUBLIC_KEY. Updatecreate_access_tokento usealgorithm="ES256"anddecode_access_token/decode_password_reset_tokento use the public key. Rotate all active refresh tokens after deploy. - Priority: HIGH
No JTI Claim and No JTI Revocation in Redis
- Risk: CLAUDE.md mandates JTI (JWT ID) in every access token stored in Redis for revocation, but the
create_access_tokenfunction emits nojticlaim and there is no check inget_current_user. - Files:
backend/services/auth.py(create_access_token),backend/deps/auth.py(get_current_user) - Impact: Deactivated users can continue using valid access tokens until TTL expiry (up to 15 minutes). Password changes and account deactivations do not immediately invalidate active sessions (only refresh tokens are revoked — not the live access token in the client's Pinia store).
- Fix approach: Add
jti=str(uuid.uuid4())to the access token payload. Inget_current_user, after successful decode, checkawait redis.get(f"jti_revoked:{jti}")and raise 401 if set. Add arevoke_access_token(jti, ttl)helper called from account deactivation and password change. - Priority: HIGH
No Token Fingerprint / Token Binding
- Risk: CLAUDE.md requires a
fgp(fingerprint) claim = HMAC ofUser-Agent + Accept-Language, validated on every request. This is absent. - Files:
backend/services/auth.py,backend/deps/auth.py - Impact: Stolen access tokens can be replayed from any device/browser. Token binding would limit the window of a stolen token attack.
- Fix approach: On login, compute
fgp = hmac.new(key, (user_agent + accept_lang).encode(), sha256).hexdigest()[:16]. Embed in JWT payload. Inget_current_user, recompute and compare withhmac.compare_digest. - Priority: MEDIUM
Password Change Does Not Revoke Active Sessions
- Risk:
POST /api/auth/change-passwordupdatespassword_hashand writes an audit log but never callsrevoke_all_refresh_tokens. CLAUDE.md mandates "Password change… immediately revoke all active sessions." - Files:
backend/api/auth.pylines 446–495 - Impact: An attacker who has a valid refresh cookie can continue rotating tokens even after the account owner changes their password.
- Fix approach: Add
await auth_service.revoke_all_refresh_tokens(session, current_user.id)after the password hash update, beforesession.commit(), and also invalidate all JTIs for that user in Redis (once JTI is implemented). - Priority: HIGH
TOTP Disable Does Not Revoke Active Sessions
- Risk:
DELETE /api/auth/totpclears the TOTP secret and disables TOTP but does not callrevoke_all_refresh_tokens. CLAUDE.md mandates revocation on "TOTP enroll/revoke." - Files:
backend/api/auth.pylines 587–616 - Impact: An attacker who triggered TOTP removal (via CSRF or compromised session) and has a refresh token continues to operate as an authenticated user with no second factor.
- Fix approach: Add
await auth_service.revoke_all_refresh_tokens(session, current_user.id)indisable_totpbeforesession.commit(). - Priority: HIGH
Health Endpoint Exposes Internal Error Details Without Auth
- Risk:
GET /healthreturns full Python exception class names and messages (e.g."error: OperationalError: (psycopg.OperationalError) …") with no authentication requirement. The comment at line 144 (T-01-05-03) acknowledges this but defers the fix to "Phase 2." - Files:
backend/main.pylines 136–167 - Impact: Exposes DB driver versions, hostnames, and connection string fragments to unauthenticated callers. Information useful for targeted attacks.
- Fix approach: Replace
f"error: {type(e).__name__}: {e}"with"error"in non-debug mode. Log the detail server-side only. Optionally require admin Bearer token for the detailed form. - Priority: MEDIUM
Default Secrets Shipped in Code
- Risk:
backend/config.pyhardcodessecret_key = "CHANGEME",cloud_creds_key = "CHANGEME-32-bytes-padded!!", andminio_secret_key = "changeme_minio_app"as Pydantic field defaults. - Files:
backend/config.pylines 31, 61, 21 - Impact: If deployed without overriding env vars, production tokens are signed with the known
CHANGEMEkey, all cloud credentials can be decrypted by anyone with the source code, and MinIO uses a known password. Critical misconfiguration vector. - Fix approach: Change defaults to
""and add a startup validator (@model_validator(mode="after")) that raisesValueErrorwhen these fields equal their placeholder values in production (DEBUG=false). Log a WARNING in dev if the default is detected. - Priority: HIGH
default_storage_backend Not Validated Against Allowlist
- Risk:
PATCH /api/users/me/default-storageacceptsbody.backendas a free string and writes it directly to the DB with no allowlist validation. - Files:
backend/api/cloud.pylines 927–946 - Impact: A user can set
default_storage_backendto any arbitrary string. A future code path using it as a routing key could allow bypassing the_CLOUD_PROVIDERSallowlist. - Fix approach: Validate
body.backend in {"minio", "google_drive", "onedrive", "nextcloud", "webdav"}before the DB write. Use aLiteraltype or@field_validatoronDefaultStorageRequest. - Priority: MEDIUM
X-Forwarded-For Trusted for IP Rate Limiting Without Proxy Enforcement
- Risk: The IP-level rate limiter (
slowapi) usesget_remote_addresswhich readsX-Forwarded-For. Without a trusted reverse proxy normalizing this header, an attacker can bypass the IP rate limit. - Files:
backend/api/auth.pyline 44;backend/deps/utils.py - Impact: Attackers can bypass the 10 req/min IP-level limit on login, register, and TOTP endpoints by spoofing the forwarded IP on each request.
- Fix approach: In Docker Compose, front the backend with nginx configured to set
X-Forwarded-Forfrom$remote_addr, stripping any client-supplied value. Document this as a mandatory production requirement. - Priority: HIGH
Email HTML Body Uses Unsanitized Server-Supplied Link
- Risk:
send_password_reset_emailbuilds an HTML body via f-string withreset_linkdirectly in an<a href='…'>attribute without HTML-escaping. - Files:
backend/services/email.pyline 47; line ~105 (security alert email) - Impact: If
reset_linkcontains a single-quote (possible under certain URL encoding), the HTML attribute breaks. Low-severity HTML injection risk that violates defense-in-depth. - Fix approach: Use
html.escape(reset_link, quote=True)when embedding the link in the HTML body. - Priority: LOW
Audit Log Written After Commit in delete_folder
- Risk: In
DELETE /api/folders/{folder_id},session.commit()is called at line 424 andwrite_audit_log()is called at line 426 — after the commit, in a separate implicit transaction. - Files:
backend/api/folders.pylines 424–435 - Impact: If the audit log write fails (DB error, constraint violation), the folder is already deleted with no audit record. Inconsistent with the WR-08 pattern used by
delete_document(auto_commit=False). - Fix approach: Move
write_audit_log()beforesession.commit(), following the pattern used inapi/documents.py::delete_document. - Priority: MEDIUM
OAuth Callback Error Redirect May Leak Internal Exception Details
- Risk: In
oauth_callback, theexcept Exception as excblock redirects tofrontend_url/settings?cloud_error={urllib.parse.quote(str(exc))}. Exception strings from google-auth-oauthlib or msal may include OAuth client secrets, state values, or internal URL fragments. - Files:
backend/api/cloud.pylines 541–546 - Impact: Exception details appear in the browser URL bar, referrer headers, browser history, and server access logs.
- Fix approach: Map exception types to user-safe generic messages (
"auth_failed","connection_error"). Log the real exception server-side at ERROR level. Only pass an opaque error code in the redirect. - Priority: MEDIUM
Performance Concerns
N+1 Query Pattern in list_metadata / list_documents
- Risk:
services/storage.py::list_metadataloads all documents then calls_load_topic_names(session, doc.id)in a Python loop — one DB round-trip per document. The same pattern repeats in thelist_documentshandler's non-legacy code path. - Files:
backend/services/storage.pylines 136–139;backend/api/documents.pylines 501–506 - Impact: For a user with 100 documents, a single list request issues 101 DB queries. At 1000 documents, 1001 queries. Response time degrades linearly as the library grows.
- Fix approach: Replace with a single JOIN query using PostgreSQL's
array_agg(t.name)grouped by document. Or use a subquery fetching all document-topic associations for the user in one query and merging in Python. - Priority: HIGH
Entire File Loaded into Memory for Download and Task Processing
- Risk:
GET /api/documents/{id}/contentcallsawait storage_backend.get_object(…)which returns fullbytes, loads them into a list, and returnsStreamingResponse(iter([file_bytes])). The Celery extraction task also buffers the full file. - Files:
backend/api/documents.pylines 792, 827–831;backend/tasks/document_tasks.pyline 74 - Impact: A 100 MB file consumes 100 MB of heap per concurrent request. With 10 simultaneous downloads, the worker needs 1 GB just for file buffers. The 100 MB quota mitigates this today but does not scale.
- Fix approach: For MinIO, return presigned GET URLs with short TTL instead of proxying through FastAPI. For cloud backends, pipe the provider HTTP response stream directly. For Celery extraction, stream text extraction from bytes in chunks.
- Priority: MEDIUM
No Upload Size Pre-Validation
- Risk:
POST /api/documents/upload(cloud path) reads the entire file viaawait file.read()before any quota or size check. FastAPI has no globalmax_upload_sizeconfigured. - Files:
backend/api/documents.pyline 207 - Impact: A malicious user can upload a multi-gigabyte file, exhausting FastAPI worker memory before the quota check fires.
- Fix approach: Check
Content-Lengthheader at endpoint entry; reject with 413 if above a configurableMAX_UPLOAD_BYTESlimit. Add a--limit-max-requestsor body-size middleware at the uvicorn/nginx level. - Priority: MEDIUM
revoke_all_refresh_tokens Issues One UPDATE per Token
- Risk:
services/auth.py::revoke_all_refresh_tokensloads all active refresh token rows into Python, then marks eachrevoked=Trueindividually via ORM, issuing one UPDATE statement per token. - Files:
backend/services/auth.pylines 218–237 - Impact: A user with many active sessions (e.g. 50 devices) causes 50 individual UPDATE statements on sign-out-all. Could be replaced with a single bulk UPDATE.
- Fix approach: Replace with
UPDATE refresh_tokens SET revoked = true WHERE user_id = :uid AND revoked = falseand count affected rows viaresult.rowcount. - Priority: LOW
FTS Falls Back Silently on Any Exception
- Risk: The FTS code path in
list_documentswraps the FTS query inexcept Exception:and falls back to an unfiltered query. - Files:
backend/api/documents.pylines 486–489 - Impact: Any PostgreSQL error causes silent fallback — the user sees all their documents when they searched for a term, with no indication of failure.
- Fix approach: Narrow the catch to
sqlalchemy.exc.OperationalError(for SQLite compat in tests only) and log all other exceptions at ERROR level before re-raising. - Priority: MEDIUM
Reliability Concerns
Email Queue Worker Missing From Docker Compose
- Risk: Celery routes email tasks to the
emailqueue butdocker-compose.ymldefines only one Celery worker consuming-Q documents. No worker processes theemailqueue. - Files:
backend/celery_app.pyline 36;docker-compose.ymlline 96 - Impact: Password reset emails, security alert emails (refresh token reuse detection), and backup code emails are silently enqueued but never delivered. Callers receive 202 but emails never arrive.
- Fix approach: Add a
celery-worker-emailservice indocker-compose.ymlconsuming-Q email, or update the existing worker command to-Q documents,email. - Priority: HIGH
documents.updated_at Not Auto-Updated on Row Changes
- Risk:
Document.updated_atis declared withserver_default=func.now()but noonupdatetrigger. Whenextracted_text,status, orfilenameis changed,updated_atstays as the creation timestamp. - Files:
backend/db/models.pylines 192–194 - Impact:
classified_atin_doc_to_dictis computed fromdoc.updated_atwhenstatus == "classified"— ifupdated_atis stale, the displayed timestamp is incorrect. Sort-by-date after reclassification is also wrong. - Fix approach: Add a PostgreSQL
BEFORE UPDATEtrigger that setsupdated_at = now(), or addonupdate=func.now()to the mapped column (requires SQLAlchemy ORM event to fire at update time). - Priority: MEDIUM
Celery Task Result Backend Accumulates Without Expiry
- Risk: Celery is configured to use Redis as result backend with no
result_expiressetting. Task results accumulate in Redis indefinitely. - Files:
backend/celery_app.pylines 23–24 - Impact: Redis memory grows unboundedly over time, potentially causing OOM which would also break rate limiting and TOTP replay prevention.
- Fix approach: Add
celery_app.conf.result_expires = 3600or disable the result backend entirely since no code reads task results. - Priority: MEDIUM
Breadcrumb Builder in get_folder Has No Depth Limit
- Risk: The breadcrumb builder in
GET /api/folders/{folder_id}walks up the parent chain iteratively with avisitedset but no maximum depth cap. - Files:
backend/api/folders.pylines 234–247 - Impact: With a deeply nested folder tree (e.g. 200 levels of nesting), the loop issues 200 sequential DB round-trips before terminating.
- Fix approach: Add
if len(crumbs) >= 20: breakto cap at a reasonable depth. - Priority: LOW
Code Quality Concerns
Duplicate Inline IP Extraction (Not Using get_client_ip)
- Risk: Several endpoints extract the client IP inline instead of using
deps/utils.py::get_client_ip(). - Files:
backend/api/documents.pylines 269–271, 376;backend/api/cloud.pylines 624, 753 - Impact: If the trusted-proxy logic changes, all inline copies must be updated individually.
- Fix approach: Replace all inline
request.headers.get("X-Forwarded-For") or request.client.hostwithget_client_ip(request). - Priority: LOW
Document.status Is an Unconstrained String Column
- Risk:
Document.statusisString, nullable=False, default="pending"with no DB-level CHECK constraint or Python enum. Values"pending","uploaded","classified","classification_failed"are used in code but not enforced. - Files:
backend/db/models.pyline 188 - Impact: A typo in a task or direct DB write silently sets an invalid status, causing silent bugs in status-checking code (e.g.
classified_attimestamp never shown). - Fix approach: Add a migration with
ALTER TABLE documents ADD CONSTRAINT ck_documents_status CHECK (status IN ('pending', 'uploaded', 'classified', 'classification_failed')). - Priority: LOW
Stale Wave-2 Comment in documents.py Module Docstring
- Risk:
backend/api/documents.pylines 19–20 contain"NOTE (Wave 2): No auth guards on any endpoint yet — Plan 03-03 adds get_current_user…"— this is false; all handlers useget_regular_user. - Files:
backend/api/documents.pylines 19–20 - Impact: Misleads reviewers into thinking auth is not applied, potentially causing incorrect security assessments.
- Fix approach: Remove or replace the stale NOTE comment.
- Priority: LOW
classify_document Endpoint Uses Mutable Default and Unvalidated Dict Body
- Risk:
POST /api/documents/{doc_id}/classifyhasbody: dict = {}— mutable default argument antipattern and no Pydantic validation. - Files:
backend/api/documents.pyline 695 - Impact: Static analysis confusion; unvalidated request body accepts arbitrary JSON keys.
- Fix approach: Define
class ClassifyRequest(BaseModel): topics: Optional[list[str]] = Noneand replacebody: dict = {}. - Priority: LOW
Missing Tests / Coverage Gaps
No Tests for JWT Algorithm, JTI, or Token Binding
- Risk: No tests verify the JWT algorithm, JTI presence/validation, or token binding.
- Files:
backend/tests/test_auth_deps.py,backend/tests/test_auth_api.py - Impact: Algorithm or claim changes would not be caught. The security invariants are untested.
- Priority: HIGH
Password Change Has No Session-Revocation Test
- Risk: No test verifies that changing a password invalidates existing refresh tokens.
- Files:
backend/tests/test_auth_api.py - Fix approach: Add test: register → login (obtain refresh cookie) → change password → assert old refresh cookie returns 401.
- Priority: HIGH
No Frontend E2E Tests
- Risk: The frontend has Vitest unit tests for 3 stores and a small set of components, but no Playwright or Cypress E2E tests for critical user flows.
- Files:
frontend/src/stores/__tests__/,frontend/src/views/__tests__/ - Impact: Breaking changes in API contract, router guards, or component interactions are not caught until manual testing. The upload flow, TOTP enrollment, and admin operations have no automated coverage.
- Fix approach: Add Playwright E2E tests for: login → upload → view → share → recipient download.
- Priority: MEDIUM
No Regression Test for delete_folder Audit Log Ordering
- Risk: The audit log after-commit ordering issue in
delete_folderhas no test to prevent regression after fixing. - Files:
backend/tests/test_folders.py - Priority: MEDIUM
Quota Concurrency Tests Run Against SQLite, Not PostgreSQL
- Risk: Quota enforcement tests run against SQLite in the default test config. CLAUDE.md specifies "integration tests against real PostgreSQL (not SQLite for quota/UUID tests)."
- Files:
backend/tests/test_quota.py;backend/tests/conftest.py - Impact: A race condition in the atomic quota UPDATE would only be detectable with concurrent clients on real PostgreSQL.
- Fix approach: Mark quota atomicity tests with
@pytest.mark.skipif(not live_services_available, ...)and add a concurrent-upload test usingasyncio.gather. - Priority: MEDIUM
Pinia Stores documents.js and topics.js Have No Unit Tests
- Risk: The stores for documents and topics — which implement pagination, filtering, and topic assignment logic — have no tests in
frontend/src/stores/__tests__/. - Files:
frontend/src/stores/documents.js,frontend/src/stores/topics.js - Priority: LOW
Dependency Risks
All Backend Dependencies Use Floor >= Version Pins
- Risk:
backend/requirements.txtuses>=for all packages including security-critical ones:PyJWT>=2.8.0,pwdlib[argon2]>=0.2.1,cryptography>=41.0.0,fastapi>=0.111. - Files:
backend/requirements.txt - Impact:
pip installresolves to the latest available version at build time. A breaking change or vulnerability in any dependency silently takes effect on the next Docker build. CLAUDE.md mandates exact version pinning for security-critical packages. - Fix approach: Run
pip freeze > requirements.lockto generate an exact pinned lockfile. Usepip-toolsoruv lockto manage upgrades. At minimum, pinPyJWT,pwdlib,cryptography, andfastapito exact versions. - Priority: HIGH
minio/minio:latest Tag in Docker Compose
- Risk:
docker-compose.ymlusesimage: minio/minio:latest— a floating tag that pulls a new release ondocker compose pull. - Files:
docker-compose.ymlline 19 - Impact: Breaking MinIO API changes or security regressions in a new release could break file storage without warning.
- Fix approach: Pin to a specific MinIO release tag (e.g.
minio/minio:RELEASE.2024-11-07T00-52-20Z). - Priority: MEDIUM
Infrastructure and Operational Concerns
No Reverse Proxy / TLS Termination in Production Setup
- Risk:
docker-compose.ymlexposes the FastAPI backend on port 8000 and frontend on port 5173 directly, with no nginx or Caddy container for TLS termination orX-Forwarded-Fornormalization. - Files:
docker-compose.yml - Impact: (1) The refresh cookie uses
secure=Truein code but travels over plain HTTP, making thesecureflag ineffective. (2) IP rate limiting is spoofable. (3) Credentials and session cookies travel in cleartext. - Fix approach: Add an nginx service to
docker-compose.ymlthat terminates TLS (Let's Encrypt or self-signed), proxies/api/to the backend, and setsproxy_set_header X-Forwarded-For $remote_addr. - Priority: HIGH
MinIO Uses Plain HTTP Between Containers
- Risk:
Minio(…, secure=False)— all object data travels over HTTP between FastAPI and MinIO containers. - Files:
backend/main.pyline 82;backend/storage/__init__.pyline 48 - Impact: An attacker with access to the Docker network can intercept document bytes in transit. Critical if containers share a host with untrusted workloads.
- Fix approach: Enable TLS on MinIO (
secure=True) or document the trust model explicitly. For shared-host deployments, configure mTLS between containers. - Priority: MEDIUM (acceptable on isolated Docker bridge; critical on shared host)
No Backup Strategy for PostgreSQL or MinIO Data
- Risk:
docker-compose.ymluses named volumes (postgres_data,minio_data) with no backup tooling, retention policy, or point-in-time recovery. - Files:
docker-compose.ymllines 138–140 - Impact: A disk failure, container wipe, or accidental
docker volume rmcauses permanent loss of all user documents, credentials, audit logs, and accounts. - Fix approach: Add a
backupservice runningpg_dumpon a schedule (e.g. viaofeliaor a cron sidecar), compressing and shipping to an off-site store. Configure MinIOmc mirrorto a second bucket or provider. Document RTO/RPO targets. - Priority: HIGH
Redis Has No Persistence Configuration
- Risk: Redis is started with only
--requirepass. No--saveor--appendonly yesflags are set, making all Redis data ephemeral. - Files:
docker-compose.ymlline 42 - Impact: A Redis restart clears all rate-limit counters (brief brute-force window on auth endpoints), TOTP replay prevention keys (30-second replay window reopens), and pending OAuth state tokens.
- Fix approach: Add
--save 60 1 --appendonly yesto the Redis command and mount a Redis data volume. Document that Redis restart is a brief security event requiring monitoring. - Priority: MEDIUM
Docker Compose Mounts Source Code as Live Volume
- Risk:
docker-compose.ymlmounts./backend:/appand./frontend/src:/app/srcas live volumes (appropriate for dev hot-reload but dangerous in production if the same file is used). - Files:
docker-compose.ymllines 53–54, 131–132 - Impact: In production, host filesystem modifications immediately affect the running container without a deploy cycle.
- Fix approach: Create a
docker-compose.prod.ymlthat omits the volume mounts and uses the DockerfileCOPY . .layer only. Document the two-file strategy clearly. - Priority: MEDIUM
Dockerfile Runs Application as Root
- Risk:
backend/DockerfileusesFROM python:3.12-slimwith noUSERdirective. FastAPI and Celery run as root inside the container. - Files:
backend/Dockerfile - Impact: A container escape vulnerability or SSRF leading to RCE gives the attacker root-equivalent access to the container filesystem.
- Fix approach: Add
RUN adduser --disabled-password --gecos "" appuser && chown -R appuser /appandUSER appuserbeforeEXPOSE 8000. - Priority: MEDIUM
No Structured Logging, Metrics, or Alerting
- Risk: All logging uses Python's stdlib logger with no structured format, no Prometheus/StatsD metrics endpoint, no error aggregation service, and no alerting on security events in the audit log.
- Files: All backend files
- Impact: Silent failures — email queue not processing, repeated TOTP replay attempts, brute-force login spikes — go undetected. Failed Celery tasks log to stderr with no aggregation. The security alert email on refresh token reuse is the only active notification mechanism.
- Fix approach: Add
structlogfor JSON-formatted structured logs. Add a/metricsendpoint withprometheus-fastapi-instrumentator. Configure alerting onauth.login_failedcount spikes in the audit log. - Priority: MEDIUM
Concerns audit: 2026-06-02