curo/kite

Files

T

curo1305 daa7e0f289 docs: add domain research (4 dimensions + synthesis)

2026-05-21 20:42:16 +02:00

40 KiB

Raw Permalink Blame History

Pitfalls Research

Domain: Multi-user SaaS document management platform (FastAPI + Vue 3, PostgreSQL + MinIO, cloud storage integrations) Researched: 2026-05-21 Confidence: HIGH (auth/storage/multi-tenancy patterns are well-established; specific FastAPI + MinIO combination is MEDIUM — no web search available)

Critical Pitfalls

Pitfall 1: JWT in localStorage — XSS Gives Full Account Takeover

What goes wrong: The Vue 3 SPA stores the JWT access token in localStorage. Any JavaScript injected via XSS (in file names, document content previewed in the UI, a compromised dependency) can call localStorage.getItem('token') and exfiltrate a long-lived credential. The attacker then impersonates the user from any origin, bypasses TOTP entirely (the token is post-authentication), and can access all documents, including cloud storage credentials.

Why it happens: localStorage is the path of least resistance in SPAs. It survives page reloads, works with Axios interceptors trivially, and requires no server-side session state. FastAPI tutorials almost universally use Authorization: Bearer headers set from localStorage.

How to avoid:

Issue the JWT as an httpOnly, SameSite=Strict, Secure cookie — JavaScript cannot read it.
Use a short-lived access token (15 minutes) in the httpOnly cookie.
Issue a separate refresh token (httpOnly cookie, longer TTL, /auth/refresh path-scoped) to rotate access tokens silently.
The Vue frontend never holds the raw token string. Axios is configured with withCredentials: true; the browser attaches cookies automatically.
CSRF protection: because SameSite=Strict blocks cross-site cookie submission, CSRF tokens are not strictly required for same-origin SPAs, but add a CSRF header check (X-Requested-With: XMLHttpRequest) as defence-in-depth.

Warning signs:

localStorage.getItem anywhere in auth-related frontend code
JWT decode functions in frontend code (the frontend should not need to decode a token it can't read)
Authorization: Bearer ${token} header set manually in Axios interceptor

Phase to address: Auth phase (Phase 1 — Users & Auth)

Pitfall 2: TOTP Bypass via Password Reset Flow

What goes wrong: The password reset flow issues a one-time token by email. After the user resets their password, the system logs them in and redirects to the dashboard — skipping the TOTP prompt because the session was created through the reset path, not the login path. An attacker who compromises a user's email account can therefore completely bypass TOTP.

Why it happens: Password reset and login are treated as separate code paths. The TOTP check lives in the login handler; the reset handler creates a session directly after credential update without going through the TOTP gate.

How to avoid:

After a successful password reset, issue a partial session or a password_reset_pending state, not a full authenticated session.
Force the user to complete the full login flow (including TOTP if enabled) from the new credentials.
Alternatively, on password reset completion, invalidate all existing sessions and send the user to the login page (no auto-login).
Log password resets in the audit trail with IP and user-agent.

Warning signs:

Password reset handler calls the same session-creation function as login but omits 2FA state checks
No mfa_verified flag on sessions (only authenticated)
Users can reach protected endpoints via a token created from the reset path

Phase to address: Auth phase (Phase 1 — Users & Auth)

Pitfall 3: Refresh Token Rotation — Stolen Refresh Token Not Detected

What goes wrong: Refresh tokens are issued as single-use rotating credentials. If an attacker steals a refresh token and uses it before the legitimate user does, the server rotates the token and the attacker has a valid new one. The legitimate user's next refresh request fails — but without a detection mechanism the failure just looks like a session expiry, no alert is raised, and the attacker continues with the stolen session indefinitely.

Why it happens: Teams implement the "rotate on use" mechanic without implementing the "revoke family on reuse" detection. The refresh_tokens table lacks a family_id column linking reissued tokens.

How to avoid:

Store refresh tokens in the database with a family_id (UUID for the original issuance chain) and a revoked flag.
When a refresh token is presented: if it is already marked revoked (i.e., a previously rotated token), revoke the entire family — force logout of all sessions for that user.
Emit a security alert (audit log + optionally email) when a reuse attempt is detected.
Refresh tokens should be hashed before storage (bcrypt or SHA-256 with a per-row salt), same as passwords.

Warning signs:

Refresh tokens stored as plain values in DB with no revoked column
No family_id linking related rotation chains
Stolen refresh token detection treated as "v2 feature"

Phase to address: Auth phase (Phase 1 — Users & Auth)

Pitfall 4: TOTP — Timing Attack on Code Verification

What goes wrong: TOTP verification uses Python == string comparison. Python's == on strings is not constant-time — it short-circuits on the first differing character. A sufficiently sophisticated timing oracle (millions of requests from a local network) can distinguish valid from invalid codes, reducing the 6-digit brute-force space. More practically: without rate limiting, an attacker can brute-force all 1,000,000 possible 6-digit codes during a 30-second window.

Why it happens: TOTP libraries (PyOTP) return a string; developers do if provided == expected. Rate limiting is added "later" and often never lands.

How to avoid:

Use hmac.compare_digest(provided, expected) instead of == for TOTP comparison.
Rate-limit TOTP attempts: 5 attempts per 30-second window, 15-minute lockout on excess.
The lockout must be stored server-side (Redis or DB), not client-side.
Accept only the current window and optionally ±1 window for clock drift — do not accept wider ranges.
Log every failed TOTP attempt with IP.

Warning signs:

if totp.verify(code): without inspecting PyOTP's internal comparison method
No rate limit on POST /auth/totp/verify
TOTP window set to 2+ (default is ±1 window = 90 seconds valid — wider than needed)

Phase to address: Auth phase (Phase 1 — Users & Auth)

Pitfall 5: Path Traversal in User File Access (MinIO Object Keys)

What goes wrong: MinIO object keys are constructed from user input: f"users/{user_id}/{filename}". If filename contains ../, ../../, or URL-encoded equivalents (%2e%2e%2f), the resulting key may escape the user's prefix and land in another user's namespace. A request for ../../other_user_id/secret.pdf resolves to users/other_user_id/secret.pdf.

Why it happens: Developers trust that MinIO will sanitize paths. It does not — S3-compatible APIs treat object keys as arbitrary strings. The prefix-based isolation is only as safe as the key construction code.

How to avoid:

Never use raw filenames as object key components. Generate a UUID (or UUID + original extension) as the stored key: f"users/{user_id}/{uuid4()}{ext}".
Store the human-readable filename in the database metadata row, completely decoupled from the storage key.
If filenames must appear in keys for any reason, strip or reject any /, \, .., % characters before key construction.
On retrieval, look up the object key from the database row (which is owned by user_id) rather than constructing it from user input.

Warning signs:

object_key = f"users/{user_id}/{request.filename}"
File download endpoint accepts a path or filename query parameter and constructs the key from it
No database lookup intermediating between "user requests file" and "MinIO key used"

Phase to address: Storage migration phase (Phase 2 — DB + MinIO migration)

Pitfall 6: Quota Race Condition — Concurrent Uploads Bypass the Limit

What goes wrong: Two upload requests arrive simultaneously for a user at 99 MB of a 100 MB quota. Both read quota usage as 99 MB, both pass the 99 + 1 < 100 check, both proceed to upload — the user ends at 101 MB. At larger scale (many simultaneous large uploads) the overage can be significant.

Why it happens: Quota enforcement is implemented as: read current usage → check → write file → update usage. This is a classic check-then-act race when the check and the write are not atomic.

How to avoid:

Enforce quota atomically using a SELECT ... FOR UPDATE on the quota row before uploading, or use a PostgreSQL advisory lock keyed on user_id.
Better: use an optimistic update: UPDATE quotas SET used_bytes = used_bytes + $new WHERE user_id = $uid AND used_bytes + $new <= limit_bytes RETURNING used_bytes. If 0 rows are updated, the quota was exceeded — reject before touching MinIO.
Only update used_bytes after the MinIO upload succeeds, but hold the lock/reservation through the upload, or use a two-phase: reserve bytes → upload → confirm, with a cleanup job for stuck reservations.
Never read quota, do arithmetic in Python, then write back as two separate statements.

Warning signs:

current = get_quota(user_id); if current + size <= limit: upload()
No database transaction wrapping quota check and update
Quota table updated with SET used_bytes = $computed_value (full overwrite) rather than used_bytes + delta

Phase to address: Storage migration phase (Phase 2 — DB + MinIO migration) and Quotas phase

Pitfall 7: Admin Privilege Escalation via Missing Ownership Checks

What goes wrong: An admin API endpoint is gated on is_admin=True but document-access endpoints only check is_authenticated. An admin user calling GET /api/documents/{id} with any document ID can read any user's document because the handler checks authentication but not document.owner_id == current_user.id. The privacy-first model is violated without any special exploit — just a correctly authenticated request.

Why it happens: Authorization is implemented as authentication + role checks ("is this user an admin?") without resource-level ownership verification ("does this user own this resource?"). FastAPI dependency injection makes it easy to write current_user = Depends(get_current_user) and forget to check resource.user_id == current_user.id.

How to avoid:

Every document/file/folder endpoint must assert resource.user_id == current_user.id (or check share grants). This check cannot be optional or deferred.
Admins access user account metadata via separate admin-scoped endpoints that explicitly exclude document content, file URLs, and cloud credentials.
Write a test: log in as admin, attempt GET /documents/{document_owned_by_other_user}, assert 403.
Use a centralized assert_document_access(document, current_user) function rather than inline checks to prevent omissions.

Warning signs:

Document endpoints that check if not current_user but not if document.user_id != current_user.id
Admin endpoints that return document content or presigned URLs
No explicit tests for cross-user access attempts

Phase to address: Auth phase (Phase 1) and every subsequent phase that adds resource-access endpoints

Pitfall 8: Cloud Credential Leakage via Admin Query

What goes wrong: Cloud storage credentials (OAuth tokens, Nextcloud passwords) are encrypted at rest. But a query like SELECT * FROM cloud_connections WHERE user_id = $uid returns the ciphertext. If an admin dashboard endpoint runs this query and serializes the full row to JSON, the ciphertext ships to the admin browser. An admin cannot decrypt it without the key — but the ciphertext is now in browser history, proxy logs, and admin audit records. If the encryption key is later exposed, all credentials decrypt retroactively.

Why it happens: ORM models serialize all columns by default. Developers add encrypted_credentials to the model and forget to exclude it from admin-facing serializers.

How to avoid:

The cloud_connections table's credential column (encrypted_token, encrypted_refresh_token) must be excluded from all serialization by default.
Use explicit Pydantic response models for every endpoint — no orm_mode with full model pass-through.
Admin endpoints for cloud connections return only: provider, account_label, connected_at, last_used_at, status — never the credential column, not even ciphertext.
Audit log cloud credential access separately: the only code path that should ever read the encrypted column is the storage adapter, not admin or user-info endpoints.

Warning signs:

CloudConnection Pydantic schema includes encrypted_token field
Admin user-detail endpoint returns full cloud connection rows
ORM model uses model_config = ConfigDict(from_attributes=True) without explicit field exclusions

Phase to address: Cloud storage integration phase (Phase 3 or 4)

Pitfall 9: Flat-file to PostgreSQL Migration — Data Loss During Cutover

What goes wrong: The migration script reads all JSON files, transforms them to DB rows, and then a flag switches the app to use PostgreSQL. Documents uploaded during the migration window are written to the old JSON store (because the flag has not flipped yet) and are missed by the migration script, which ran before they arrived. After cutover, those documents are invisible.

Why it happens: Migrations are planned as offline events ("we'll take the app down for 5 minutes") and then discovered to be impractical — the app is used 24/7 or downtime feels risky. The team runs the migration online without dual-write.

How to avoid:

Plan the migration in three phases:
1. Dual-write: deploy code that writes to both JSON and PostgreSQL, reads from JSON. All new documents land in both stores.
2. Backfill: run migration script to copy historical JSON records to PostgreSQL. New records are already there.
3. Cutover: flip read source to PostgreSQL, verify counts match, remove JSON write path.
The dual-write window can be as short as one deployment cycle.
Include a reconciliation check: assert doc_count_json == doc_count_db before cutting over.
Keep the old JSON store read-only for 1 week post-cutover as a rollback option.

Warning signs:

Migration is planned as "take app down, run script, bring back up"
No count-reconciliation step
No rollback plan documented before migration begins

Phase to address: DB + MinIO migration phase (Phase 2)

Pitfall 10: Blocking I/O Inside Async FastAPI Handlers (Existing Issue)

What goes wrong: The codebase already uses synchronous filelock and open() inside async def handlers (CONCERNS.md item 6). After migration to PostgreSQL + MinIO, if synchronous DB drivers (psycopg2) or synchronous MinIO client calls replace the file I/O without wrapping in asyncio.to_thread(), the event loop stalls on every I/O operation. Under concurrent load (multiple users uploading), requests queue behind each other even though the hardware is idle.

Why it happens: SQLAlchemy sync engine + psycopg2 is the default FastAPI tutorial stack. The MinIO Python SDK (minio package) is synchronous. Developers add await in front of calls that are not coroutines, get a type error, remove the await, and ship blocking code.

How to avoid:

Use SQLAlchemy async engine with asyncpg driver: create_async_engine("postgresql+asyncpg://...").
Wrap all MinIO SDK calls in asyncio.to_thread() since there is no official async MinIO client: await asyncio.to_thread(minio_client.put_object, ...).
Alternatively use aioboto3 (async boto3) which works with MinIO's S3-compatible API.
aiofiles is already in requirements.txt — use it for any remaining local file operations.
Run pytest-asyncio + asyncio.get_event_loop().set_debug(True) in tests; the debug mode logs blocking calls.

Warning signs:

from sqlalchemy import create_engine (not create_async_engine)
import psycopg2 anywhere in application code
minio_client.put_object(...) not wrapped in asyncio.to_thread
Uvicorn logs show high request latency even with low CPU usage

Phase to address: DB + MinIO migration phase (Phase 2)

Pitfall 11: N+1 Queries on Document Listing

What goes wrong: GET /api/documents returns a list of documents, each including folder name, topic names, and share count. The handler fetches the document list, then for each document issues separate queries for folder, topics, and shares. 100 documents = 301+ queries per page load. The existing codebase already has an O(N) disk scan equivalent (CONCERNS.md items 8, 9); the PostgreSQL migration preserves this pattern if not corrected.

Why it happens: SQLAlchemy lazy loading is the default. document.folder.name triggers a query. Developers don't see it until production load hits.

How to avoid:

Use SQLAlchemy joinedload or selectinload options on the document list query to eagerly load related entities.
The list endpoint should be a single SQL query with JOINs, not a loop.
Add EXPLAIN ANALYZE checks as part of the phase acceptance criteria.
Enable SQLAlchemy echo=True in development to log every SQL statement.
Pagination at the database level: LIMIT 50 OFFSET $n — not "fetch all, slice in Python."

Warning signs:

SQLAlchemy models using default relationship loading without explicit lazy= options
for doc in documents: doc.folder.name pattern
No joinedload or selectinload in list query definitions
Query count in tests grows linearly with fixture size

Phase to address: DB + MinIO migration phase (Phase 2); enforce in document listing feature phase

Pitfall 12: MinIO Presigned URL Expiry — Stale Links in the UI

What goes wrong: The document preview and download UI displays links generated as MinIO presigned URLs with a 1-hour expiry. The Vue frontend fetches the document list on page load, stores the URLs in Pinia state, and renders them as <img src> or <a href>. If the user leaves the tab open for 2 hours and then clicks a link, they get a 403 (presigned URL expired). No error is shown; the image just fails to load or the download silently fails.

Why it happens: Presigned URLs feel like permanent links. Teams generate them at list time for convenience and cache them in frontend state without expiry awareness.

How to avoid:

Do not embed presigned URLs in list responses. Return only document metadata.
Generate presigned URLs on demand: a separate GET /api/documents/{id}/download-url endpoint generates a short-lived URL (5–15 minutes) at the moment of user intent.
Alternatively, proxy document bytes through the FastAPI backend (GET /api/documents/{id}/file streams from MinIO) — eliminates presigned URL complexity at the cost of bandwidth.
If presigned URLs are cached in frontend state, include the expiry timestamp and regenerate before expiry.

Warning signs:

Presigned URLs included in document list response JSON
Frontend stores presigned URLs in Pinia state without TTL tracking
No GET /api/documents/{id}/download-url endpoint (or equivalent)
presigned_url_expiry = 3600 with no frontend refresh logic

Phase to address: DB + MinIO migration phase (Phase 2) and document access phase

Pitfall 13: OAuth Token for Cloud Storage — Revocation Not Handled

What goes wrong: A user connects Google Drive via OAuth. The app stores the encrypted access token + refresh token. Six months later, the user revokes the app's access in their Google account settings. The next time DocuVault tries to access their Drive, it gets a 401. The refresh token exchange also fails with invalid_grant. The app has no handler for this — it retries, logs an error, and the user sees a generic "storage error" with no path to reconnect.

Why it happens: OAuth happy-path is well-documented. Revocation and invalid_grant handling are in footnotes. Teams handle 401 → refresh → retry but don't handle refresh failure → user notification.

How to avoid:

Wrap all cloud storage adapter calls in a two-level error handler:
- Level 1: 401 → attempt refresh → retry.
- Level 2: refresh fails with invalid_grant or token_revoked → mark connection as REQUIRES_REAUTH in DB, do not retry, surface a "reconnect required" notice to the user.
The cloud_connections table must have a status column: ACTIVE | REQUIRES_REAUTH | ERROR.
The UI must poll or react to REQUIRES_REAUTH state and prompt the user to re-authorize, not just show an error toast.
Never silently swallow a revoked-token error or retry indefinitely.

Warning signs:

Cloud storage adapter raises generic StorageError for all OAuth errors without distinguishing revocation
No status or needs_reauth column on cloud_connections
Reconnect UI not planned in feature scope

Phase to address: Cloud storage integration phase

Pitfall 14: Cloud Storage Rate Limits — No Backoff, No Per-User Throttling

What goes wrong: Multiple users with Google Drive connected trigger simultaneous document uploads. The app hits Google's per-app rate limit (Drive API: 20,000 requests/100 seconds/user, but also global per-project limits). Google returns 429. The app has no retry-with-backoff; uploads fail and the errors are attributed to the individual user, not the platform-level limit.

Why it happens: Rate limits are per-API-key, not per-user. A single misbehaving user or burst of activity for all users shares the same quota. Teams only test with a single user.

How to avoid:

Implement exponential backoff with jitter for all cloud storage adapter calls (start 1s, max 32s, 3 retries).
Use a task queue (Celery or FastAPI BackgroundTasks with a semaphore) for cloud-destined uploads rather than processing inline in the request handler.
Log 429 responses separately from other errors — they indicate platform-level throttling, not user errors.
Per-provider rate limit documentation should be captured in adapter docstrings.

Warning signs:

Cloud upload performed synchronously inside the HTTP request handler with no retry logic
No tenacity or equivalent retry decorator on provider calls
429 responses from cloud APIs cause 500 responses to the end user

Phase to address: Cloud storage integration phase

What goes wrong: A user requests account deletion. The app deletes the PostgreSQL rows, the MinIO objects, and the user record. But if the user's documents were stored in Google Drive or OneDrive via the cloud storage adapter, those files remain on the cloud provider. The user's data has not actually been erased from all systems.

Why it happens: Account deletion is implemented against the systems the team controls (PostgreSQL, MinIO). Cloud storage is treated as "the user's own storage" so deletion is skipped. Under GDPR, if the platform wrote files to the user's cloud storage on their behalf as a data processor, erasure obligations may extend there.

How to avoid:

Account deletion must enumerate all cloud_connections for the user and call adapter.delete_all_user_files() (or equivalent) on each active connection.
If the cloud connection is already in REQUIRES_REAUTH state, the deletion cannot proceed automatically — log a compliance alert and require manual follow-up or notify the user to delete manually.
Document the data flow in a data map: "files stored in X go to Y" — this is a GDPR Article 30 requirement.
Right to erasure flow must be tested as part of acceptance criteria, not assumed to work.

Warning signs:

Account deletion handler only operates on PostgreSQL and MinIO
No delete_user_files method in the StorageBackend interface
No GDPR data map in project documentation

Phase to address: Cloud storage integration phase and account management phase

Pitfall 16: Encryption Key Management — Single Key for All Users

What goes wrong: All cloud credentials are encrypted with one Fernet key stored in CLOUD_ENCRYPTION_KEY env var. If this key is exposed (leaked .env, compromised deployment config, insider threat), every user's cloud credentials decrypt at once. The privacy-first model collapses entirely.

Why it happens: One key is simpler than per-user keys. Fernet key derivation per user is slightly more complex to implement.

How to avoid:

Derive a per-user encryption key from the master key + a user-specific salt: HKDF(master_key, salt=user_id_bytes, info=b"cloud-credentials"). The salt is stored in the users table.
Even if the master key leaks, an attacker still needs each user's salt to decrypt their credentials.
Rotate the master key on a schedule: re-encrypt all stored credentials with the new key in a background job.
Audit log any code path that accesses the encryption key — this should only ever happen in the storage adapter, never in API handlers.
Never log the encryption key or derived key material, even at DEBUG level.

Warning signs:

Single ENCRYPTION_KEY env var with no per-user derivation
Encryption/decryption called directly in API endpoint handlers (not in the adapter layer)
No key rotation plan documented

Phase to address: Cloud storage integration phase (design the key derivation before writing the first line of credential storage code)

Pitfall 17: Shared Document — Quota Not Charged to Sharer Correctly After Revoke

What goes wrong: User A shares a document with User B. The document is not duplicated — it stays in User A's storage and User A's quota. User A revokes the share. If the quota update on revoke has a bug (or is missing), User A's used bytes may drift from reality over time. At scale, quota inaccuracies accumulate and users can either be locked out prematurely or exceed limits invisibly.

Why it happens: Sharing is implemented as a metadata-only operation (a shares table row), but quota accounting is only re-examined on upload and delete. Edge cases (revoke, owner deletion while share is active, cloud storage document quota) are skipped.

How to avoid:

Quota is a property of documents, not shares. Ensure the quota model is: "sum of file_size for all documents where owner_id = user_id." Shares do not affect quota calculation.
Recalculate quota from source-of-truth (a SUM query) as a periodic background job and reconcile against the cached used_bytes value. Alert if drift exceeds 1%.
The DELETE /documents/{id} handler must atomically decrement quota and delete the DB row and MinIO object in a single transaction (DB) + best-effort cleanup (MinIO).

Warning signs:

used_bytes stored as a counter updated by application code, never reconciled against a database SUM
Share revocation handler doesn't touch quota (correct if quota is owner-only, but must be verified)
Document deletion does not atomically update quota

Phase to address: Quotas and sharing phases

Technical Debt Patterns

Shortcut	Immediate Benefit	Long-term Cost	When Acceptable
Single global encryption key for cloud credentials	Simple env var config	Single point of failure; one leak decrypts all users	Never for a SaaS product
localStorage JWT storage	Easy Axios integration	XSS → full account takeover, TOTP bypass	Never; httpOnly cookies have identical DX once set up
Synchronous DB driver (psycopg2) with FastAPI	Familiar, simple	Event loop blocking under any load	Never for new greenfield code; acceptable only during an explicitly time-boxed spike
Presigned URL in list response	One fewer endpoint	Stale URLs, user-facing failures after expiry	Only for short-lived signed URLs (<5 min) with TTL tracking in frontend
Skip per-user key derivation	Fewer moving parts	Catastrophic blast radius on key leak	Never for cloud credential encryption
Inline quota check (read-check-write) without atomic update	Simple code	Silent quota bypass under concurrent load	Never for enforced limits
Soft-coding 8-char UUID prefix as document IDs (existing)	Shorter IDs	Collision risk at scale; insecure for auth tokens	Replace with full UUID before multi-user goes live

Integration Gotchas

Integration	Common Mistake	Correct Approach
Google Drive OAuth	Storing `access_token` only; ignoring `refresh_token`	Store both; refresh_token is issued only on first authorization — store immediately or it's lost
OneDrive / MSAL	Treating MSAL token cache as permanent	MSAL token cache can be invalidated by Microsoft; always handle `invalid_grant` and re-auth prompt
Nextcloud	Using Basic Auth with plain credentials	Use App Passwords (Nextcloud-specific token), never store the user's Nextcloud master password
MinIO	Using `minio.get_object()` for large files without streaming	Stream the response: `response.stream(32768)` or use boto3 streaming; loading full file to memory crashes on large PDFs
MinIO	Treating bucket names as security boundaries	Bucket names are not credentials; isolation is enforced by object key prefix + IAM policy. Set MinIO IAM so the app's service account can only access the designated bucket
PostgreSQL	Using `autocommit=True` for all connections	Quota updates and document record creation must be transactional; autocommit makes rollback impossible
PyOTP (TOTP)	Using default `valid_window=0` then switching to `valid_window=2` for "user convenience"	Window=1 (±30s) tolerates clock skew. Window=2+ increases brute-force surface disproportionately

Performance Traps

Trap	Symptoms	Prevention	When It Breaks
O(N) metadata scan on every document list (existing pattern)	List endpoint slows linearly with document count	Paginated SQL query with index on `(user_id, created_at)`	~500 documents per user
Topic count scan on every topic fetch (existing pattern)	`GET /topics` latency spikes as documents grow	Materialized counter column updated on insert/delete, or cached with 30s TTL	~1,000 documents
Synchronous MinIO upload inline in HTTP handler	Requests time out on large files; one upload blocks all other requests	Background task queue; return 202 Accepted with job ID	First user uploading a file >10 MB
Eager loading all cloud connections on user login	Login latency grows with number of connected providers	Lazy-load cloud connection status; only check on storage page	>3 cloud providers per user
Full-text search via `ILIKE '%term%'` on document content	Full table scan on every search	PostgreSQL `tsvector` full-text index on extracted text column	~10,000 documents platform-wide
JWT validation on every request without caching public key	Repeated public key fetch (if using asymmetric JWT)	Cache the signing key in memory; never refetch on every request	High request volume

Security Mistakes

Mistake	Risk	Prevention
CORS `allow_origins=["*"]` retained from existing codebase	Cross-origin requests from attacker-controlled pages	Set exact origin (frontend URL) in production; never wildcard with credentials
File type validation defined but not enforced (existing)	Executable or malicious file uploads	Enforce MIME type check at upload boundary; also validate magic bytes (not just Content-Type header)
No file size limit at HTTP boundary (existing)	Memory exhaustion, DoS via large upload	Add `max_upload_bytes` limit in FastAPI middleware before reading body
API keys in plaintext JSON (existing)	Credential leakage via Docker volume mount	Env vars for all secrets; never serialize keys to disk in application data directory
Password reset without invalidating existing sessions	Attacker keeps session after victim resets password	Invalidate all sessions for user on successful password reset
Audit log includes document metadata (filenames)	Filename can reveal document content	Audit log stores only document ID, event type, timestamp, IP — no filenames
Admin can read `cloud_connections.encrypted_token` column	Ciphertext exposure; retroactive decryption if key leaks	Exclude credential columns from all admin serializers by policy
No breach check on registration password	Users reuse passwords from breached services	Integrate HIBP k-anonymity API on registration

UX Pitfalls

Pitfall	User Impact	Better Approach
Silent presigned URL expiry (image fails to load, download does nothing)	User thinks the app is broken; no recovery path	On-demand URL generation; show explicit "link expired, regenerating..." state
TOTP enrollment without backup codes	User loses phone → permanently locked out of account	Issue 8–10 single-use backup codes at TOTP enrollment; require user to acknowledge
Quota error shown only as "Upload failed"	User doesn't know why; may try repeatedly	Return structured quota error: `{"error": "quota_exceeded", "used": 98MB, "limit": 100MB}`
Cloud storage reconnect required, but no in-app prompt	User's uploads silently fail until they notice the settings page	Banner notification or upload-time prompt: "Your Google Drive connection needs re-authorization"
Folder delete with documents inside silently deletes documents	Data loss without confirmation	Require explicit confirmation listing document count; offer "move to root" alternative
Share revoke with no notification to recipient	Recipient sees documents disappear without explanation	Audit-trail entry visible to recipient: "Access to [folder] was revoked by [owner]"

"Looks Done But Isn't" Checklist

TOTP enrollment: Backup codes issued, stored hashed, and acknowledged by user before TOTP is marked active
Password reset: Does NOT create a full authenticated session — routes through login with TOTP check
Document delete: Atomically decrements quota AND removes MinIO object AND removes DB row in a single transaction
Account delete: Deletes from MinIO, PostgreSQL, AND calls adapter.delete_user_files() for each cloud connection
Cloud disconnect: Revokes OAuth token at provider (not just deletes local record) — provider refresh tokens remain valid until explicitly revoked
Quota display: Shows real-time usage, not cached value from registration — must query SUM from DB
Sharing: Shared documents do NOT copy to recipient's quota — verify with a SELECT SUM(file_size) WHERE owner_id = recipient_id before and after share
Migration: Document count in PostgreSQL equals document count in old JSON store before cutover flag is flipped
Presigned URLs: Never in list responses — verified by API contract test asserting no url field in list endpoint response body
Admin isolation: Admin token cannot retrieve document content — verified by dedicated negative-access test

Recovery Strategies

Pitfall	Recovery Cost	Recovery Steps
JWT in localStorage discovered post-launch	HIGH	Rotate all tokens (force logout all users), redeploy with httpOnly cookies, communicate to users
Quota race condition caused 10% of users to exceed quota	MEDIUM	Run `UPDATE quotas SET used_bytes = (SELECT SUM(file_size) FROM documents WHERE owner_id = user_id)` reconciliation; enforce atomic updates going forward
Flat-file migration data loss (documents missing post-cutover)	HIGH	Restore from pre-migration backup of JSON data directory; re-run migration with dual-write; reconcile counts
Encryption key leaked (single-key model)	CRITICAL	Immediately rotate key; re-encrypt all credentials with new key; notify affected users; assume all cloud credentials compromised — prompt all users to revoke and reconnect
Admin escalation (admin accessed user documents)	HIGH	Audit log review to determine scope; GDPR breach notification if EU users affected (72h window); patch authorization layer; force session invalidation
Cloud connection revoked but app retried indefinitely	LOW	Mark all stuck connections as `REQUIRES_REAUTH`; send user notification; add `invalid_grant` handler to adapter

Pitfall-to-Phase Mapping

Pitfall	Prevention Phase	Verification
JWT in localStorage	Phase 1 — Auth	Confirm no `localStorage.setItem('token', ...)` in frontend; auth cookie is httpOnly in browser DevTools
TOTP bypass via password reset	Phase 1 — Auth	Integration test: reset password → assert no full session created → assert login still requires TOTP
Refresh token reuse detection	Phase 1 — Auth	Test: use a rotated (old) refresh token → assert 401 and full family revocation
TOTP timing attack / brute force	Phase 1 — Auth	Rate limit test: 6th TOTP attempt within window → assert 429
Path traversal in MinIO keys	Phase 2 — DB + MinIO migration	Unit test: filename with `../` → assert stored key is UUID-based, not user-provided
Quota race condition	Phase 2 — DB + MinIO migration	Concurrent upload test: 2 threads uploading to full quota → assert only one succeeds
Admin privilege escalation	Phase 1 + every phase adding resource endpoints	Negative access test: admin JWT → `GET /documents/{other_user_doc_id}` → assert 403
Cloud credential leakage via admin query	Phase 3 — Cloud storage	Test: admin list user cloud connections → assert response contains no `token` or `encrypted_` fields
Flat-file to PostgreSQL data loss	Phase 2 — DB + MinIO migration	Count reconciliation: `assert json_doc_count == db_doc_count` in migration script
Blocking I/O in async handlers	Phase 2 — DB + MinIO migration	Load test with 10 concurrent requests; assert no event loop warnings; check SQLAlchemy driver is asyncpg
N+1 queries on document list	Phase 2 — DB + MinIO migration	Enable SQLAlchemy echo; assert document list query count = 1 regardless of result size
Presigned URL expiry	Phase 2 — DB + MinIO migration	API contract test: list endpoint response contains no presigned URL fields
OAuth revocation not handled	Phase 3 — Cloud storage	Mock `invalid_grant` from OAuth provider → assert connection status set to `REQUIRES_REAUTH`, not retried
Rate limits without backoff	Phase 3 — Cloud storage	Mock 429 from provider → assert exponential backoff, not immediate 500 to user
GDPR right to erasure incomplete	Phase 3 — Cloud storage + account mgmt	Account deletion test: assert cloud adapter `delete_user_files` called for each active connection
Single encryption key	Phase 3 — Cloud storage	Code review gate: assert per-user HKDF derivation; no single-key decryption in any code path
Quota drift from sharing/revoke	Sharing phase	Reconciliation query: `SUM(file_size WHERE owner_id = user)` vs `used_bytes` — assert < 1% drift

Sources

Project context: /Users/nik/Documents/Progamming/document_scanner/.planning/PROJECT.md
Codebase audit: /Users/nik/Documents/Progamming/document_scanner/.planning/codebase/CONCERNS.md
Auth pitfalls: OWASP JWT Security Cheat Sheet, OWASP Authentication Cheat Sheet (well-established, HIGH confidence)
OAuth token lifecycle: RFC 6749 Section 10.4 (refresh token revocation), Google OAuth error codes documentation (HIGH confidence from training data)
MinIO/S3 path traversal: AWS S3 object key documentation; known class of vulnerability in multi-tenant S3-prefix isolation (HIGH confidence)
Quota race conditions: Classic check-then-act concurrency pattern; PostgreSQL SELECT FOR UPDATE documentation (HIGH confidence)
GDPR Article 17 (right to erasure) and Article 30 (records of processing): well-established regulatory requirements (HIGH confidence)
N+1 query pattern: SQLAlchemy documentation on relationship loading strategies (HIGH confidence)
TOTP RFC 6238; PyOTP library behavior from training data (MEDIUM — verify PyOTP valid_window defaults against current docs before implementation)

Pitfalls research for: DocuVault — multi-user SaaS document management (FastAPI + Vue 3 + PostgreSQL + MinIO) Researched: 2026-05-21

40 KiB Raw Permalink Blame History Unescape Escape

Pitfalls Research

Critical Pitfalls

Pitfall 1: JWT in localStorage — XSS Gives Full Account Takeover

Pitfall 2: TOTP Bypass via Password Reset Flow

Pitfall 3: Refresh Token Rotation — Stolen Refresh Token Not Detected

Pitfall 4: TOTP — Timing Attack on Code Verification

Pitfall 5: Path Traversal in User File Access (MinIO Object Keys)

Pitfall 6: Quota Race Condition — Concurrent Uploads Bypass the Limit

Pitfall 7: Admin Privilege Escalation via Missing Ownership Checks

Pitfall 8: Cloud Credential Leakage via Admin Query

Pitfall 9: Flat-file to PostgreSQL Migration — Data Loss During Cutover

Pitfall 10: Blocking I/O Inside Async FastAPI Handlers (Existing Issue)

Pitfall 11: N+1 Queries on Document Listing

Pitfall 12: MinIO Presigned URL Expiry — Stale Links in the UI

Pitfall 13: OAuth Token for Cloud Storage — Revocation Not Handled

Pitfall 14: Cloud Storage Rate Limits — No Backoff, No Per-User Throttling

Pitfall 15: GDPR Right to Erasure — Cloud Copies Not Deleted

Pitfall 16: Encryption Key Management — Single Key for All Users

Pitfall 17: Shared Document — Quota Not Charged to Sharer Correctly After Revoke

Technical Debt Patterns

Integration Gotchas

Performance Traps

Security Mistakes

UX Pitfalls

"Looks Done But Isn't" Checklist

Recovery Strategies

Pitfall-to-Phase Mapping

Sources

40 KiB

Raw Permalink Blame History