Files

40 KiB
Raw Permalink Blame History

Pitfalls Research

Domain: Multi-user SaaS document management platform (FastAPI + Vue 3, PostgreSQL + MinIO, cloud storage integrations) Researched: 2026-05-21 Confidence: HIGH (auth/storage/multi-tenancy patterns are well-established; specific FastAPI + MinIO combination is MEDIUM — no web search available)


Critical Pitfalls

Pitfall 1: JWT in localStorage — XSS Gives Full Account Takeover

What goes wrong: The Vue 3 SPA stores the JWT access token in localStorage. Any JavaScript injected via XSS (in file names, document content previewed in the UI, a compromised dependency) can call localStorage.getItem('token') and exfiltrate a long-lived credential. The attacker then impersonates the user from any origin, bypasses TOTP entirely (the token is post-authentication), and can access all documents, including cloud storage credentials.

Why it happens: localStorage is the path of least resistance in SPAs. It survives page reloads, works with Axios interceptors trivially, and requires no server-side session state. FastAPI tutorials almost universally use Authorization: Bearer headers set from localStorage.

How to avoid:

  • Issue the JWT as an httpOnly, SameSite=Strict, Secure cookie — JavaScript cannot read it.
  • Use a short-lived access token (15 minutes) in the httpOnly cookie.
  • Issue a separate refresh token (httpOnly cookie, longer TTL, /auth/refresh path-scoped) to rotate access tokens silently.
  • The Vue frontend never holds the raw token string. Axios is configured with withCredentials: true; the browser attaches cookies automatically.
  • CSRF protection: because SameSite=Strict blocks cross-site cookie submission, CSRF tokens are not strictly required for same-origin SPAs, but add a CSRF header check (X-Requested-With: XMLHttpRequest) as defence-in-depth.

Warning signs:

  • localStorage.getItem anywhere in auth-related frontend code
  • JWT decode functions in frontend code (the frontend should not need to decode a token it can't read)
  • Authorization: Bearer ${token} header set manually in Axios interceptor

Phase to address: Auth phase (Phase 1 — Users & Auth)


Pitfall 2: TOTP Bypass via Password Reset Flow

What goes wrong: The password reset flow issues a one-time token by email. After the user resets their password, the system logs them in and redirects to the dashboard — skipping the TOTP prompt because the session was created through the reset path, not the login path. An attacker who compromises a user's email account can therefore completely bypass TOTP.

Why it happens: Password reset and login are treated as separate code paths. The TOTP check lives in the login handler; the reset handler creates a session directly after credential update without going through the TOTP gate.

How to avoid:

  • After a successful password reset, issue a partial session or a password_reset_pending state, not a full authenticated session.
  • Force the user to complete the full login flow (including TOTP if enabled) from the new credentials.
  • Alternatively, on password reset completion, invalidate all existing sessions and send the user to the login page (no auto-login).
  • Log password resets in the audit trail with IP and user-agent.

Warning signs:

  • Password reset handler calls the same session-creation function as login but omits 2FA state checks
  • No mfa_verified flag on sessions (only authenticated)
  • Users can reach protected endpoints via a token created from the reset path

Phase to address: Auth phase (Phase 1 — Users & Auth)


Pitfall 3: Refresh Token Rotation — Stolen Refresh Token Not Detected

What goes wrong: Refresh tokens are issued as single-use rotating credentials. If an attacker steals a refresh token and uses it before the legitimate user does, the server rotates the token and the attacker has a valid new one. The legitimate user's next refresh request fails — but without a detection mechanism the failure just looks like a session expiry, no alert is raised, and the attacker continues with the stolen session indefinitely.

Why it happens: Teams implement the "rotate on use" mechanic without implementing the "revoke family on reuse" detection. The refresh_tokens table lacks a family_id column linking reissued tokens.

How to avoid:

  • Store refresh tokens in the database with a family_id (UUID for the original issuance chain) and a revoked flag.
  • When a refresh token is presented: if it is already marked revoked (i.e., a previously rotated token), revoke the entire family — force logout of all sessions for that user.
  • Emit a security alert (audit log + optionally email) when a reuse attempt is detected.
  • Refresh tokens should be hashed before storage (bcrypt or SHA-256 with a per-row salt), same as passwords.

Warning signs:

  • Refresh tokens stored as plain values in DB with no revoked column
  • No family_id linking related rotation chains
  • Stolen refresh token detection treated as "v2 feature"

Phase to address: Auth phase (Phase 1 — Users & Auth)


Pitfall 4: TOTP — Timing Attack on Code Verification

What goes wrong: TOTP verification uses Python == string comparison. Python's == on strings is not constant-time — it short-circuits on the first differing character. A sufficiently sophisticated timing oracle (millions of requests from a local network) can distinguish valid from invalid codes, reducing the 6-digit brute-force space. More practically: without rate limiting, an attacker can brute-force all 1,000,000 possible 6-digit codes during a 30-second window.

Why it happens: TOTP libraries (PyOTP) return a string; developers do if provided == expected. Rate limiting is added "later" and often never lands.

How to avoid:

  • Use hmac.compare_digest(provided, expected) instead of == for TOTP comparison.
  • Rate-limit TOTP attempts: 5 attempts per 30-second window, 15-minute lockout on excess.
  • The lockout must be stored server-side (Redis or DB), not client-side.
  • Accept only the current window and optionally ±1 window for clock drift — do not accept wider ranges.
  • Log every failed TOTP attempt with IP.

Warning signs:

  • if totp.verify(code): without inspecting PyOTP's internal comparison method
  • No rate limit on POST /auth/totp/verify
  • TOTP window set to 2+ (default is ±1 window = 90 seconds valid — wider than needed)

Phase to address: Auth phase (Phase 1 — Users & Auth)


Pitfall 5: Path Traversal in User File Access (MinIO Object Keys)

What goes wrong: MinIO object keys are constructed from user input: f"users/{user_id}/{filename}". If filename contains ../, ../../, or URL-encoded equivalents (%2e%2e%2f), the resulting key may escape the user's prefix and land in another user's namespace. A request for ../../other_user_id/secret.pdf resolves to users/other_user_id/secret.pdf.

Why it happens: Developers trust that MinIO will sanitize paths. It does not — S3-compatible APIs treat object keys as arbitrary strings. The prefix-based isolation is only as safe as the key construction code.

How to avoid:

  • Never use raw filenames as object key components. Generate a UUID (or UUID + original extension) as the stored key: f"users/{user_id}/{uuid4()}{ext}".
  • Store the human-readable filename in the database metadata row, completely decoupled from the storage key.
  • If filenames must appear in keys for any reason, strip or reject any /, \, .., % characters before key construction.
  • On retrieval, look up the object key from the database row (which is owned by user_id) rather than constructing it from user input.

Warning signs:

  • object_key = f"users/{user_id}/{request.filename}"
  • File download endpoint accepts a path or filename query parameter and constructs the key from it
  • No database lookup intermediating between "user requests file" and "MinIO key used"

Phase to address: Storage migration phase (Phase 2 — DB + MinIO migration)


Pitfall 6: Quota Race Condition — Concurrent Uploads Bypass the Limit

What goes wrong: Two upload requests arrive simultaneously for a user at 99 MB of a 100 MB quota. Both read quota usage as 99 MB, both pass the 99 + 1 < 100 check, both proceed to upload — the user ends at 101 MB. At larger scale (many simultaneous large uploads) the overage can be significant.

Why it happens: Quota enforcement is implemented as: read current usage → check → write file → update usage. This is a classic check-then-act race when the check and the write are not atomic.

How to avoid:

  • Enforce quota atomically using a SELECT ... FOR UPDATE on the quota row before uploading, or use a PostgreSQL advisory lock keyed on user_id.
  • Better: use an optimistic update: UPDATE quotas SET used_bytes = used_bytes + $new WHERE user_id = $uid AND used_bytes + $new <= limit_bytes RETURNING used_bytes. If 0 rows are updated, the quota was exceeded — reject before touching MinIO.
  • Only update used_bytes after the MinIO upload succeeds, but hold the lock/reservation through the upload, or use a two-phase: reserve bytes → upload → confirm, with a cleanup job for stuck reservations.
  • Never read quota, do arithmetic in Python, then write back as two separate statements.

Warning signs:

  • current = get_quota(user_id); if current + size <= limit: upload()
  • No database transaction wrapping quota check and update
  • Quota table updated with SET used_bytes = $computed_value (full overwrite) rather than used_bytes + delta

Phase to address: Storage migration phase (Phase 2 — DB + MinIO migration) and Quotas phase


Pitfall 7: Admin Privilege Escalation via Missing Ownership Checks

What goes wrong: An admin API endpoint is gated on is_admin=True but document-access endpoints only check is_authenticated. An admin user calling GET /api/documents/{id} with any document ID can read any user's document because the handler checks authentication but not document.owner_id == current_user.id. The privacy-first model is violated without any special exploit — just a correctly authenticated request.

Why it happens: Authorization is implemented as authentication + role checks ("is this user an admin?") without resource-level ownership verification ("does this user own this resource?"). FastAPI dependency injection makes it easy to write current_user = Depends(get_current_user) and forget to check resource.user_id == current_user.id.

How to avoid:

  • Every document/file/folder endpoint must assert resource.user_id == current_user.id (or check share grants). This check cannot be optional or deferred.
  • Admins access user account metadata via separate admin-scoped endpoints that explicitly exclude document content, file URLs, and cloud credentials.
  • Write a test: log in as admin, attempt GET /documents/{document_owned_by_other_user}, assert 403.
  • Use a centralized assert_document_access(document, current_user) function rather than inline checks to prevent omissions.

Warning signs:

  • Document endpoints that check if not current_user but not if document.user_id != current_user.id
  • Admin endpoints that return document content or presigned URLs
  • No explicit tests for cross-user access attempts

Phase to address: Auth phase (Phase 1) and every subsequent phase that adds resource-access endpoints


Pitfall 8: Cloud Credential Leakage via Admin Query

What goes wrong: Cloud storage credentials (OAuth tokens, Nextcloud passwords) are encrypted at rest. But a query like SELECT * FROM cloud_connections WHERE user_id = $uid returns the ciphertext. If an admin dashboard endpoint runs this query and serializes the full row to JSON, the ciphertext ships to the admin browser. An admin cannot decrypt it without the key — but the ciphertext is now in browser history, proxy logs, and admin audit records. If the encryption key is later exposed, all credentials decrypt retroactively.

Why it happens: ORM models serialize all columns by default. Developers add encrypted_credentials to the model and forget to exclude it from admin-facing serializers.

How to avoid:

  • The cloud_connections table's credential column (encrypted_token, encrypted_refresh_token) must be excluded from all serialization by default.
  • Use explicit Pydantic response models for every endpoint — no orm_mode with full model pass-through.
  • Admin endpoints for cloud connections return only: provider, account_label, connected_at, last_used_at, status — never the credential column, not even ciphertext.
  • Audit log cloud credential access separately: the only code path that should ever read the encrypted column is the storage adapter, not admin or user-info endpoints.

Warning signs:

  • CloudConnection Pydantic schema includes encrypted_token field
  • Admin user-detail endpoint returns full cloud connection rows
  • ORM model uses model_config = ConfigDict(from_attributes=True) without explicit field exclusions

Phase to address: Cloud storage integration phase (Phase 3 or 4)


Pitfall 9: Flat-file to PostgreSQL Migration — Data Loss During Cutover

What goes wrong: The migration script reads all JSON files, transforms them to DB rows, and then a flag switches the app to use PostgreSQL. Documents uploaded during the migration window are written to the old JSON store (because the flag has not flipped yet) and are missed by the migration script, which ran before they arrived. After cutover, those documents are invisible.

Why it happens: Migrations are planned as offline events ("we'll take the app down for 5 minutes") and then discovered to be impractical — the app is used 24/7 or downtime feels risky. The team runs the migration online without dual-write.

How to avoid:

  • Plan the migration in three phases:
    1. Dual-write: deploy code that writes to both JSON and PostgreSQL, reads from JSON. All new documents land in both stores.
    2. Backfill: run migration script to copy historical JSON records to PostgreSQL. New records are already there.
    3. Cutover: flip read source to PostgreSQL, verify counts match, remove JSON write path.
  • The dual-write window can be as short as one deployment cycle.
  • Include a reconciliation check: assert doc_count_json == doc_count_db before cutting over.
  • Keep the old JSON store read-only for 1 week post-cutover as a rollback option.

Warning signs:

  • Migration is planned as "take app down, run script, bring back up"
  • No count-reconciliation step
  • No rollback plan documented before migration begins

Phase to address: DB + MinIO migration phase (Phase 2)


Pitfall 10: Blocking I/O Inside Async FastAPI Handlers (Existing Issue)

What goes wrong: The codebase already uses synchronous filelock and open() inside async def handlers (CONCERNS.md item 6). After migration to PostgreSQL + MinIO, if synchronous DB drivers (psycopg2) or synchronous MinIO client calls replace the file I/O without wrapping in asyncio.to_thread(), the event loop stalls on every I/O operation. Under concurrent load (multiple users uploading), requests queue behind each other even though the hardware is idle.

Why it happens: SQLAlchemy sync engine + psycopg2 is the default FastAPI tutorial stack. The MinIO Python SDK (minio package) is synchronous. Developers add await in front of calls that are not coroutines, get a type error, remove the await, and ship blocking code.

How to avoid:

  • Use SQLAlchemy async engine with asyncpg driver: create_async_engine("postgresql+asyncpg://...").
  • Wrap all MinIO SDK calls in asyncio.to_thread() since there is no official async MinIO client: await asyncio.to_thread(minio_client.put_object, ...).
  • Alternatively use aioboto3 (async boto3) which works with MinIO's S3-compatible API.
  • aiofiles is already in requirements.txt — use it for any remaining local file operations.
  • Run pytest-asyncio + asyncio.get_event_loop().set_debug(True) in tests; the debug mode logs blocking calls.

Warning signs:

  • from sqlalchemy import create_engine (not create_async_engine)
  • import psycopg2 anywhere in application code
  • minio_client.put_object(...) not wrapped in asyncio.to_thread
  • Uvicorn logs show high request latency even with low CPU usage

Phase to address: DB + MinIO migration phase (Phase 2)


Pitfall 11: N+1 Queries on Document Listing

What goes wrong: GET /api/documents returns a list of documents, each including folder name, topic names, and share count. The handler fetches the document list, then for each document issues separate queries for folder, topics, and shares. 100 documents = 301+ queries per page load. The existing codebase already has an O(N) disk scan equivalent (CONCERNS.md items 8, 9); the PostgreSQL migration preserves this pattern if not corrected.

Why it happens: SQLAlchemy lazy loading is the default. document.folder.name triggers a query. Developers don't see it until production load hits.

How to avoid:

  • Use SQLAlchemy joinedload or selectinload options on the document list query to eagerly load related entities.
  • The list endpoint should be a single SQL query with JOINs, not a loop.
  • Add EXPLAIN ANALYZE checks as part of the phase acceptance criteria.
  • Enable SQLAlchemy echo=True in development to log every SQL statement.
  • Pagination at the database level: LIMIT 50 OFFSET $n — not "fetch all, slice in Python."

Warning signs:

  • SQLAlchemy models using default relationship loading without explicit lazy= options
  • for doc in documents: doc.folder.name pattern
  • No joinedload or selectinload in list query definitions
  • Query count in tests grows linearly with fixture size

Phase to address: DB + MinIO migration phase (Phase 2); enforce in document listing feature phase


What goes wrong: The document preview and download UI displays links generated as MinIO presigned URLs with a 1-hour expiry. The Vue frontend fetches the document list on page load, stores the URLs in Pinia state, and renders them as <img src> or <a href>. If the user leaves the tab open for 2 hours and then clicks a link, they get a 403 (presigned URL expired). No error is shown; the image just fails to load or the download silently fails.

Why it happens: Presigned URLs feel like permanent links. Teams generate them at list time for convenience and cache them in frontend state without expiry awareness.

How to avoid:

  • Do not embed presigned URLs in list responses. Return only document metadata.
  • Generate presigned URLs on demand: a separate GET /api/documents/{id}/download-url endpoint generates a short-lived URL (515 minutes) at the moment of user intent.
  • Alternatively, proxy document bytes through the FastAPI backend (GET /api/documents/{id}/file streams from MinIO) — eliminates presigned URL complexity at the cost of bandwidth.
  • If presigned URLs are cached in frontend state, include the expiry timestamp and regenerate before expiry.

Warning signs:

  • Presigned URLs included in document list response JSON
  • Frontend stores presigned URLs in Pinia state without TTL tracking
  • No GET /api/documents/{id}/download-url endpoint (or equivalent)
  • presigned_url_expiry = 3600 with no frontend refresh logic

Phase to address: DB + MinIO migration phase (Phase 2) and document access phase


Pitfall 13: OAuth Token for Cloud Storage — Revocation Not Handled

What goes wrong: A user connects Google Drive via OAuth. The app stores the encrypted access token + refresh token. Six months later, the user revokes the app's access in their Google account settings. The next time DocuVault tries to access their Drive, it gets a 401. The refresh token exchange also fails with invalid_grant. The app has no handler for this — it retries, logs an error, and the user sees a generic "storage error" with no path to reconnect.

Why it happens: OAuth happy-path is well-documented. Revocation and invalid_grant handling are in footnotes. Teams handle 401 → refresh → retry but don't handle refresh failure → user notification.

How to avoid:

  • Wrap all cloud storage adapter calls in a two-level error handler:
    • Level 1: 401 → attempt refresh → retry.
    • Level 2: refresh fails with invalid_grant or token_revoked → mark connection as REQUIRES_REAUTH in DB, do not retry, surface a "reconnect required" notice to the user.
  • The cloud_connections table must have a status column: ACTIVE | REQUIRES_REAUTH | ERROR.
  • The UI must poll or react to REQUIRES_REAUTH state and prompt the user to re-authorize, not just show an error toast.
  • Never silently swallow a revoked-token error or retry indefinitely.

Warning signs:

  • Cloud storage adapter raises generic StorageError for all OAuth errors without distinguishing revocation
  • No status or needs_reauth column on cloud_connections
  • Reconnect UI not planned in feature scope

Phase to address: Cloud storage integration phase


Pitfall 14: Cloud Storage Rate Limits — No Backoff, No Per-User Throttling

What goes wrong: Multiple users with Google Drive connected trigger simultaneous document uploads. The app hits Google's per-app rate limit (Drive API: 20,000 requests/100 seconds/user, but also global per-project limits). Google returns 429. The app has no retry-with-backoff; uploads fail and the errors are attributed to the individual user, not the platform-level limit.

Why it happens: Rate limits are per-API-key, not per-user. A single misbehaving user or burst of activity for all users shares the same quota. Teams only test with a single user.

How to avoid:

  • Implement exponential backoff with jitter for all cloud storage adapter calls (start 1s, max 32s, 3 retries).
  • Use a task queue (Celery or FastAPI BackgroundTasks with a semaphore) for cloud-destined uploads rather than processing inline in the request handler.
  • Log 429 responses separately from other errors — they indicate platform-level throttling, not user errors.
  • Per-provider rate limit documentation should be captured in adapter docstrings.

Warning signs:

  • Cloud upload performed synchronously inside the HTTP request handler with no retry logic
  • No tenacity or equivalent retry decorator on provider calls
  • 429 responses from cloud APIs cause 500 responses to the end user

Phase to address: Cloud storage integration phase


Pitfall 15: GDPR Right to Erasure — Cloud Copies Not Deleted

What goes wrong: A user requests account deletion. The app deletes the PostgreSQL rows, the MinIO objects, and the user record. But if the user's documents were stored in Google Drive or OneDrive via the cloud storage adapter, those files remain on the cloud provider. The user's data has not actually been erased from all systems.

Why it happens: Account deletion is implemented against the systems the team controls (PostgreSQL, MinIO). Cloud storage is treated as "the user's own storage" so deletion is skipped. Under GDPR, if the platform wrote files to the user's cloud storage on their behalf as a data processor, erasure obligations may extend there.

How to avoid:

  • Account deletion must enumerate all cloud_connections for the user and call adapter.delete_all_user_files() (or equivalent) on each active connection.
  • If the cloud connection is already in REQUIRES_REAUTH state, the deletion cannot proceed automatically — log a compliance alert and require manual follow-up or notify the user to delete manually.
  • Document the data flow in a data map: "files stored in X go to Y" — this is a GDPR Article 30 requirement.
  • Right to erasure flow must be tested as part of acceptance criteria, not assumed to work.

Warning signs:

  • Account deletion handler only operates on PostgreSQL and MinIO
  • No delete_user_files method in the StorageBackend interface
  • No GDPR data map in project documentation

Phase to address: Cloud storage integration phase and account management phase


Pitfall 16: Encryption Key Management — Single Key for All Users

What goes wrong: All cloud credentials are encrypted with one Fernet key stored in CLOUD_ENCRYPTION_KEY env var. If this key is exposed (leaked .env, compromised deployment config, insider threat), every user's cloud credentials decrypt at once. The privacy-first model collapses entirely.

Why it happens: One key is simpler than per-user keys. Fernet key derivation per user is slightly more complex to implement.

How to avoid:

  • Derive a per-user encryption key from the master key + a user-specific salt: HKDF(master_key, salt=user_id_bytes, info=b"cloud-credentials"). The salt is stored in the users table.
  • Even if the master key leaks, an attacker still needs each user's salt to decrypt their credentials.
  • Rotate the master key on a schedule: re-encrypt all stored credentials with the new key in a background job.
  • Audit log any code path that accesses the encryption key — this should only ever happen in the storage adapter, never in API handlers.
  • Never log the encryption key or derived key material, even at DEBUG level.

Warning signs:

  • Single ENCRYPTION_KEY env var with no per-user derivation
  • Encryption/decryption called directly in API endpoint handlers (not in the adapter layer)
  • No key rotation plan documented

Phase to address: Cloud storage integration phase (design the key derivation before writing the first line of credential storage code)


Pitfall 17: Shared Document — Quota Not Charged to Sharer Correctly After Revoke

What goes wrong: User A shares a document with User B. The document is not duplicated — it stays in User A's storage and User A's quota. User A revokes the share. If the quota update on revoke has a bug (or is missing), User A's used bytes may drift from reality over time. At scale, quota inaccuracies accumulate and users can either be locked out prematurely or exceed limits invisibly.

Why it happens: Sharing is implemented as a metadata-only operation (a shares table row), but quota accounting is only re-examined on upload and delete. Edge cases (revoke, owner deletion while share is active, cloud storage document quota) are skipped.

How to avoid:

  • Quota is a property of documents, not shares. Ensure the quota model is: "sum of file_size for all documents where owner_id = user_id." Shares do not affect quota calculation.
  • Recalculate quota from source-of-truth (a SUM query) as a periodic background job and reconcile against the cached used_bytes value. Alert if drift exceeds 1%.
  • The DELETE /documents/{id} handler must atomically decrement quota and delete the DB row and MinIO object in a single transaction (DB) + best-effort cleanup (MinIO).

Warning signs:

  • used_bytes stored as a counter updated by application code, never reconciled against a database SUM
  • Share revocation handler doesn't touch quota (correct if quota is owner-only, but must be verified)
  • Document deletion does not atomically update quota

Phase to address: Quotas and sharing phases


Technical Debt Patterns

Shortcut Immediate Benefit Long-term Cost When Acceptable
Single global encryption key for cloud credentials Simple env var config Single point of failure; one leak decrypts all users Never for a SaaS product
localStorage JWT storage Easy Axios integration XSS → full account takeover, TOTP bypass Never; httpOnly cookies have identical DX once set up
Synchronous DB driver (psycopg2) with FastAPI Familiar, simple Event loop blocking under any load Never for new greenfield code; acceptable only during an explicitly time-boxed spike
Presigned URL in list response One fewer endpoint Stale URLs, user-facing failures after expiry Only for short-lived signed URLs (<5 min) with TTL tracking in frontend
Skip per-user key derivation Fewer moving parts Catastrophic blast radius on key leak Never for cloud credential encryption
Inline quota check (read-check-write) without atomic update Simple code Silent quota bypass under concurrent load Never for enforced limits
Soft-coding 8-char UUID prefix as document IDs (existing) Shorter IDs Collision risk at scale; insecure for auth tokens Replace with full UUID before multi-user goes live

Integration Gotchas

Integration Common Mistake Correct Approach
Google Drive OAuth Storing access_token only; ignoring refresh_token Store both; refresh_token is issued only on first authorization — store immediately or it's lost
OneDrive / MSAL Treating MSAL token cache as permanent MSAL token cache can be invalidated by Microsoft; always handle invalid_grant and re-auth prompt
Nextcloud Using Basic Auth with plain credentials Use App Passwords (Nextcloud-specific token), never store the user's Nextcloud master password
MinIO Using minio.get_object() for large files without streaming Stream the response: response.stream(32768) or use boto3 streaming; loading full file to memory crashes on large PDFs
MinIO Treating bucket names as security boundaries Bucket names are not credentials; isolation is enforced by object key prefix + IAM policy. Set MinIO IAM so the app's service account can only access the designated bucket
PostgreSQL Using autocommit=True for all connections Quota updates and document record creation must be transactional; autocommit makes rollback impossible
PyOTP (TOTP) Using default valid_window=0 then switching to valid_window=2 for "user convenience" Window=1 (±30s) tolerates clock skew. Window=2+ increases brute-force surface disproportionately

Performance Traps

Trap Symptoms Prevention When It Breaks
O(N) metadata scan on every document list (existing pattern) List endpoint slows linearly with document count Paginated SQL query with index on (user_id, created_at) ~500 documents per user
Topic count scan on every topic fetch (existing pattern) GET /topics latency spikes as documents grow Materialized counter column updated on insert/delete, or cached with 30s TTL ~1,000 documents
Synchronous MinIO upload inline in HTTP handler Requests time out on large files; one upload blocks all other requests Background task queue; return 202 Accepted with job ID First user uploading a file >10 MB
Eager loading all cloud connections on user login Login latency grows with number of connected providers Lazy-load cloud connection status; only check on storage page >3 cloud providers per user
Full-text search via ILIKE '%term%' on document content Full table scan on every search PostgreSQL tsvector full-text index on extracted text column ~10,000 documents platform-wide
JWT validation on every request without caching public key Repeated public key fetch (if using asymmetric JWT) Cache the signing key in memory; never refetch on every request High request volume

Security Mistakes

Mistake Risk Prevention
CORS allow_origins=["*"] retained from existing codebase Cross-origin requests from attacker-controlled pages Set exact origin (frontend URL) in production; never wildcard with credentials
File type validation defined but not enforced (existing) Executable or malicious file uploads Enforce MIME type check at upload boundary; also validate magic bytes (not just Content-Type header)
No file size limit at HTTP boundary (existing) Memory exhaustion, DoS via large upload Add max_upload_bytes limit in FastAPI middleware before reading body
API keys in plaintext JSON (existing) Credential leakage via Docker volume mount Env vars for all secrets; never serialize keys to disk in application data directory
Password reset without invalidating existing sessions Attacker keeps session after victim resets password Invalidate all sessions for user on successful password reset
Audit log includes document metadata (filenames) Filename can reveal document content Audit log stores only document ID, event type, timestamp, IP — no filenames
Admin can read cloud_connections.encrypted_token column Ciphertext exposure; retroactive decryption if key leaks Exclude credential columns from all admin serializers by policy
No breach check on registration password Users reuse passwords from breached services Integrate HIBP k-anonymity API on registration

UX Pitfalls

Pitfall User Impact Better Approach
Silent presigned URL expiry (image fails to load, download does nothing) User thinks the app is broken; no recovery path On-demand URL generation; show explicit "link expired, regenerating..." state
TOTP enrollment without backup codes User loses phone → permanently locked out of account Issue 810 single-use backup codes at TOTP enrollment; require user to acknowledge
Quota error shown only as "Upload failed" User doesn't know why; may try repeatedly Return structured quota error: {"error": "quota_exceeded", "used": 98MB, "limit": 100MB}
Cloud storage reconnect required, but no in-app prompt User's uploads silently fail until they notice the settings page Banner notification or upload-time prompt: "Your Google Drive connection needs re-authorization"
Folder delete with documents inside silently deletes documents Data loss without confirmation Require explicit confirmation listing document count; offer "move to root" alternative
Share revoke with no notification to recipient Recipient sees documents disappear without explanation Audit-trail entry visible to recipient: "Access to [folder] was revoked by [owner]"

"Looks Done But Isn't" Checklist

  • TOTP enrollment: Backup codes issued, stored hashed, and acknowledged by user before TOTP is marked active
  • Password reset: Does NOT create a full authenticated session — routes through login with TOTP check
  • Document delete: Atomically decrements quota AND removes MinIO object AND removes DB row in a single transaction
  • Account delete: Deletes from MinIO, PostgreSQL, AND calls adapter.delete_user_files() for each cloud connection
  • Cloud disconnect: Revokes OAuth token at provider (not just deletes local record) — provider refresh tokens remain valid until explicitly revoked
  • Quota display: Shows real-time usage, not cached value from registration — must query SUM from DB
  • Sharing: Shared documents do NOT copy to recipient's quota — verify with a SELECT SUM(file_size) WHERE owner_id = recipient_id before and after share
  • Migration: Document count in PostgreSQL equals document count in old JSON store before cutover flag is flipped
  • Presigned URLs: Never in list responses — verified by API contract test asserting no url field in list endpoint response body
  • Admin isolation: Admin token cannot retrieve document content — verified by dedicated negative-access test

Recovery Strategies

Pitfall Recovery Cost Recovery Steps
JWT in localStorage discovered post-launch HIGH Rotate all tokens (force logout all users), redeploy with httpOnly cookies, communicate to users
Quota race condition caused 10% of users to exceed quota MEDIUM Run UPDATE quotas SET used_bytes = (SELECT SUM(file_size) FROM documents WHERE owner_id = user_id) reconciliation; enforce atomic updates going forward
Flat-file migration data loss (documents missing post-cutover) HIGH Restore from pre-migration backup of JSON data directory; re-run migration with dual-write; reconcile counts
Encryption key leaked (single-key model) CRITICAL Immediately rotate key; re-encrypt all credentials with new key; notify affected users; assume all cloud credentials compromised — prompt all users to revoke and reconnect
Admin escalation (admin accessed user documents) HIGH Audit log review to determine scope; GDPR breach notification if EU users affected (72h window); patch authorization layer; force session invalidation
Cloud connection revoked but app retried indefinitely LOW Mark all stuck connections as REQUIRES_REAUTH; send user notification; add invalid_grant handler to adapter

Pitfall-to-Phase Mapping

Pitfall Prevention Phase Verification
JWT in localStorage Phase 1 — Auth Confirm no localStorage.setItem('token', ...) in frontend; auth cookie is httpOnly in browser DevTools
TOTP bypass via password reset Phase 1 — Auth Integration test: reset password → assert no full session created → assert login still requires TOTP
Refresh token reuse detection Phase 1 — Auth Test: use a rotated (old) refresh token → assert 401 and full family revocation
TOTP timing attack / brute force Phase 1 — Auth Rate limit test: 6th TOTP attempt within window → assert 429
Path traversal in MinIO keys Phase 2 — DB + MinIO migration Unit test: filename with ../ → assert stored key is UUID-based, not user-provided
Quota race condition Phase 2 — DB + MinIO migration Concurrent upload test: 2 threads uploading to full quota → assert only one succeeds
Admin privilege escalation Phase 1 + every phase adding resource endpoints Negative access test: admin JWT → GET /documents/{other_user_doc_id} → assert 403
Cloud credential leakage via admin query Phase 3 — Cloud storage Test: admin list user cloud connections → assert response contains no token or encrypted_ fields
Flat-file to PostgreSQL data loss Phase 2 — DB + MinIO migration Count reconciliation: assert json_doc_count == db_doc_count in migration script
Blocking I/O in async handlers Phase 2 — DB + MinIO migration Load test with 10 concurrent requests; assert no event loop warnings; check SQLAlchemy driver is asyncpg
N+1 queries on document list Phase 2 — DB + MinIO migration Enable SQLAlchemy echo; assert document list query count = 1 regardless of result size
Presigned URL expiry Phase 2 — DB + MinIO migration API contract test: list endpoint response contains no presigned URL fields
OAuth revocation not handled Phase 3 — Cloud storage Mock invalid_grant from OAuth provider → assert connection status set to REQUIRES_REAUTH, not retried
Rate limits without backoff Phase 3 — Cloud storage Mock 429 from provider → assert exponential backoff, not immediate 500 to user
GDPR right to erasure incomplete Phase 3 — Cloud storage + account mgmt Account deletion test: assert cloud adapter delete_user_files called for each active connection
Single encryption key Phase 3 — Cloud storage Code review gate: assert per-user HKDF derivation; no single-key decryption in any code path
Quota drift from sharing/revoke Sharing phase Reconciliation query: SUM(file_size WHERE owner_id = user) vs used_bytes — assert < 1% drift

Sources

  • Project context: /Users/nik/Documents/Progamming/document_scanner/.planning/PROJECT.md
  • Codebase audit: /Users/nik/Documents/Progamming/document_scanner/.planning/codebase/CONCERNS.md
  • Auth pitfalls: OWASP JWT Security Cheat Sheet, OWASP Authentication Cheat Sheet (well-established, HIGH confidence)
  • OAuth token lifecycle: RFC 6749 Section 10.4 (refresh token revocation), Google OAuth error codes documentation (HIGH confidence from training data)
  • MinIO/S3 path traversal: AWS S3 object key documentation; known class of vulnerability in multi-tenant S3-prefix isolation (HIGH confidence)
  • Quota race conditions: Classic check-then-act concurrency pattern; PostgreSQL SELECT FOR UPDATE documentation (HIGH confidence)
  • GDPR Article 17 (right to erasure) and Article 30 (records of processing): well-established regulatory requirements (HIGH confidence)
  • N+1 query pattern: SQLAlchemy documentation on relationship loading strategies (HIGH confidence)
  • TOTP RFC 6238; PyOTP library behavior from training data (MEDIUM — verify PyOTP valid_window defaults against current docs before implementation)

Pitfalls research for: DocuVault — multi-user SaaS document management (FastAPI + Vue 3 + PostgreSQL + MinIO) Researched: 2026-05-21