Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
20 KiB
Architecture
Analysis Date: 2026-06-02
System Overview
┌──────────────────────────────────────────────────────────────────────────┐
│ Browser (Vue 3 SPA) │
│ Pinia stores: auth · documents · folders · topics · cloudConnections │
│ Router: / /folders/:id /document/:id /cloud /admin /shared │
└─────────────────────┬──────────────────────────────────┬────────────────┘
│ fetch() + Bearer JWT │ PUT (presigned)
▼ ▼
┌──────────────────────────────────┐ ┌───────────────────────────────┐
│ FastAPI Backend :8000 │ │ MinIO :9000 │
│ api/auth api/documents │ │ Bucket: docuvault │
│ api/folders api/shares │ │ Keys: {uid}/{did}/{uuid}{e} │
│ api/cloud api/admin │ └───────────────────────────────┘
│ api/audit api/topics │
│ │ ┌───────────────────────────────┐
│ Middleware stack (per request):│ │ Cloud Backends │
│ OriginValidation (first) │ │ Google Drive / OneDrive │
│ CORS │ │ Nextcloud / WebDAV │
│ SecurityHeaders (CSP, etc.) │ └───────────────────────────────┘
│ SlowAPI rate limiter │
│ │ ┌───────────────────────────────┐
│ Deps layer: │ │ Celery Worker │
│ get_db (AsyncSession) │◄────► tasks/document_tasks.py │
│ get_current_user (JWT) │ │ tasks/email_tasks.py │
│ get_current_admin │ │ tasks/audit_tasks.py │
│ get_regular_user │ └───────────────────────────────┘
└────────────┬─────────────────────┘
│ SQLAlchemy async ┌───────────────────────────────┐
▼ │ Redis :6379 │
┌──────────────────────────┐ │ Rate limiting (slowapi) │
│ PostgreSQL :5432 │ │ TOTP replay cache │
│ 11 tables: │◄──────────► Celery broker + results │
│ users · quotas │ │ OAuth state tokens (TTL) │
│ refresh_tokens │ └───────────────────────────────┘
│ backup_codes · folders │
│ documents · topics │ ┌───────────────────────────────┐
│ document_topics │ │ AI Providers (pluggable) │
│ shares · audit_log │ │ Ollama · OpenAI · Anthropic │
│ cloud_connections │ │ LMStudio │
│ groups (v2 stub) │ │ ai/base.py → AIProvider ABC │
└──────────────────────────┘ └───────────────────────────────┘
Component Responsibilities
| Component | Responsibility | Key File |
|---|---|---|
| FastAPI app | ASGI entry point, middleware, router registration | backend/main.py |
| Auth API | Register, login (TOTP/backup), refresh, logout, password reset | backend/api/auth.py |
| Documents API | Upload URL, confirm, list, delete, classify, stream content | backend/api/documents.py |
| Folders API | CRUD folders, move documents between folders | backend/api/folders.py |
| Shares API | Grant/revoke/list document shares between users | backend/api/shares.py |
| Cloud API | OAuth flows, WebDAV connect, folder listing, default storage | backend/api/cloud.py |
| Admin API | User CRUD, quota, AI config, audit log, delete user | backend/api/admin.py |
| Audit API | Paginated audit log viewer + CSV export | backend/api/audit.py |
| Topics API | CRUD topics, topic suggestions | backend/api/topics.py |
| Auth service | Password hashing, JWT, refresh token family, TOTP, HIBP | backend/services/auth.py |
| Audit service | write_audit_log() — flushed within caller's transaction |
backend/services/audit.py |
| Classifier service | Selects AI provider, assigns topics, auto-creates suggestions | backend/services/classifier.py |
| Extractor service | PDF/DOCX/image/text extraction | backend/services/extractor.py |
| Storage service | ORM queries for documents + topic resolution | backend/services/storage.py |
| StorageBackend ABC | Interface for all object storage backends | backend/storage/base.py |
| Storage factory | Returns MinIOBackend or cloud backend from document record | backend/storage/__init__.py |
| MinIO backend | Presigned URL, put/get/delete, stat | backend/storage/minio_backend.py |
| Cloud backends | Google Drive, OneDrive, Nextcloud, WebDAV implementations | backend/storage/*_backend.py |
| AIProvider ABC | Interface: classify, suggest_topics, health_check | backend/ai/base.py |
| AI factory | Returns provider instance from string slug | backend/ai/__init__.py |
| Celery app | Task routing, beat schedule, JSON serialization | backend/celery_app.py |
| Document task | extract_and_classify — async bridge from sync Celery worker | backend/tasks/document_tasks.py |
| ORM models | 11-table schema, all UUID PKs, full index set | backend/db/models.py |
| DB session | Async engine, session factory (expire_on_commit=False) | backend/db/session.py |
| FastAPI deps | get_db, get_current_user, get_current_admin, get_regular_user | backend/deps/ |
| Auth store | accessToken (memory only), user, quota, refresh deduplication | frontend/src/stores/auth.js |
| Documents store | CRUD, 3-step MinIO upload with progress, search debounce | frontend/src/stores/documents.js |
| Folders store | CRUD folders, breadcrumb, rootFolders for sidebar | frontend/src/stores/folders.js |
| Topics store | CRUD topics | frontend/src/stores/topics.js |
| CloudConnections store | List/disconnect cloud connections | frontend/src/stores/cloudConnections.js |
| API client | fetch wrapper, Bearer injection, 401→refresh→retry | frontend/src/api/client.js |
| Vue Router | SPA routes, beforeEach guard (silent refresh on reload) | frontend/src/router/index.js |
| FileManagerView | Unified file manager for local folders and documents | frontend/src/views/FileManagerView.vue |
| StorageBrowser | Reusable file listing component (local + cloud modes) | frontend/src/components/storage/StorageBrowser.vue |
Pattern Overview
Overall: Layered REST API + SPA with async background processing
Key Characteristics:
- API layer is thin — validation via Pydantic, business logic in
services/ - No ORM relationships loaded — explicit queries only (prevents N+1)
- Async everywhere in FastAPI; Celery workers bridge to async via
asyncio.run() - Frontend Pinia stores own data-fetching; views delegate to stores; components emit events upward
- One DB session per request (yielded by
get_dbdep), one per Celery task invocation - All resource ownership checked inline in handlers (
resource.user_id == current_user.id)
Layers
API Layer:
- Purpose: HTTP routing, request validation, response serialization
- Location:
backend/api/ - Contains: APIRouter instances, Pydantic request/response models, FastAPI dep injection
- Depends on:
services/,deps/,db/models.py - Used by: Frontend via HTTP; not called from other backend modules
Service Layer:
- Purpose: Business logic with no FastAPI coupling (pure Python async functions)
- Location:
backend/services/ - Contains:
auth.py,audit.py,classifier.py,extractor.py,storage.py,cloud_cache.py,email.py - Depends on:
db/models.py,storage/,ai/,config - Used by:
api/layer and Celery tasks
Storage Abstraction Layer:
- Purpose: Backend-agnostic object storage interface
- Location:
backend/storage/ - Contains:
base.py(ABC),minio_backend.py,google_drive_backend.py,onedrive_backend.py,nextcloud_backend.py,webdav_backend.py,cloud_utils.py(HKDF encryption),exceptions.py - Depends on:
config,db/models.py(for cloud credential lookup) - Used by:
services/storage.py,api/documents.py, Celery tasks
AI Abstraction Layer:
- Purpose: Pluggable AI provider interface for document classification
- Location:
backend/ai/ - Contains:
base.py(ABC),ollama_provider.py,openai_provider.py,anthropic_provider.py,lmstudio_provider.py,utils.py - Depends on: External AI APIs via httpx
- Used by:
services/classifier.py
Dependency Layer:
- Purpose: FastAPI reusable dependencies (DI)
- Location:
backend/deps/ - Contains:
db.py(get_db),auth.py(get_current_user, get_current_admin, get_regular_user),utils.py(get_client_ip) - Used by: All
api/handlers
Frontend Store Layer:
- Purpose: Application state + async API calls
- Location:
frontend/src/stores/ - Contains:
auth.js,documents.js,folders.js,topics.js,cloudConnections.js - Depends on:
api/client.js - Used by: Views and components
Data Flow
Document Upload (MinIO presigned URL path)
- User drops file in
DropZone→StorageBrowseremitsupload→FileManagerView.onFilesSelected(frontend/src/views/FileManagerView.vue) documentsStore.upload(file, autoClassify, folderId)(frontend/src/stores/documents.js)POST /api/documents/upload-url→ creates pendingDocumentrow, returns presigned PUT URL +document_id(backend/api/documents.py)- XHR
PUTbytes directly from browser to MinIO presigned URL (no backend proxy, no auth header needed — URL is self-authenticating) POST /api/documents/{id}/confirm→stat_object()for authoritative size → atomic quotaUPDATE … RETURNING→ status set to'ready'(backend/api/documents.py)- If
folderId != null:PATCH /api/documents/{id}/folder→ places document in folder - Celery task
extract_and_classify.delay(document_id)enqueued → text extraction → AI classification → topic assignment (backend/tasks/document_tasks.py) authStore.fetchQuota()called on frontend to refresh sidebar quota bar
Authentication Flow
POST /api/auth/loginwith{email, password}— per-account Redis rate limit checked first (backend/api/auth.py)- Password verified with Argon2 (constant-time via pwdlib)
- If TOTP enabled and no code provided → returns
{requires_totp: true}challenge - If TOTP code provided → verified against pyotp + Redis replay prevention window
- On success:
create_access_token()(HS256 JWT, 15-min TTL) +create_refresh_token()(SHA-256 hashed, stored in DB) (backend/services/auth.py) - Access token returned in JSON body; refresh token set as
httpOnly; Secure; SameSite=Strictcookie scoped to/api/auth/refreshpath only - Frontend stores access token in
authStore.accessToken(Piniaref()— memory only, never localStorage) - On page reload: router
beforeEachguard callsauthStore.refresh()→POST /api/auth/refreshsends httpOnly cookie → new access token returned api/client.jsintercepts any 401 → callsauthStore.refresh()→ retries request once (frontend/src/api/client.js)
Refresh Token Rotation + Family Revocation
POST /api/auth/refreshreads httpOnly cookie, looks upRefreshTokenrow by SHA-256 hash- If token already revoked → all user's refresh tokens revoked → 401 + security alert email enqueued via Celery
- If valid: old token marked
revoked=True, new raw token generated and stored (hashed), rotated cookie set
Cloud Storage OAuth Flow
GET /api/cloud/oauth/initiate/{provider}→ state token stored in Redis (TTL 1800s, single-use) → authorization URL returned- Browser navigates to OAuth provider → callback to
GET /api/cloud/oauth/callback/{provider} - State token validated (single-use consumed from Redis), authorization code exchanged for credentials
- Credentials encrypted with HKDF-derived per-user Fernet key → stored in
cloud_connections.credentials_enc - On document operations:
get_storage_backend_for_document()decrypts credentials, instantiates cloud backend — transparent to API handlers (backend/storage/__init__.py)
State Management (frontend):
- Access token:
authStore.accessToken— Piniaref(null), JS memory only, cleared on logout/error - User profile:
authStore.user— Piniaref(null) - Quota:
authStore.quota— fetched after upload/delete, displayed inQuotaBar - Documents:
documentsStore.documents— local array, kept in sync via explicitfetchDocuments()calls - Folder tree:
foldersStore.rootFolders(sidebar) +foldersStore.folders(current level) - Upload progress:
documentsStore.uploadProgress— keyed${filename}__${Date.now()}to prevent key collision
Key Abstractions
StorageBackend ABC (backend/storage/base.py):
- Purpose: Uniform interface over MinIO and all cloud providers
- Methods:
put_object,get_object,delete_object,presigned_get_url,health_check,generate_presigned_put_url,stat_object - Implementations:
MinIOBackend,GoogleDriveBackend,OneDriveBackend,NextcloudBackend,WebDAVBackend - Selected by:
get_storage_backend_for_document()inbackend/storage/__init__.py
AIProvider ABC (backend/ai/base.py):
- Purpose: Pluggable classification backend
- Methods:
classify,suggest_topics,health_check - Returns:
ClassificationResult(topics, suggested_new_topics, reasoning) - Implementations:
OllamaProvider,OpenAIProvider,AnthropicProvider,LMStudioProvider - Selected by:
ai/__init__.pyfactory, keyed to per-userai_provider/ai_modelfrom DB
Dependency Chain:
get_current_user→ parses Bearer JWT → loadsUserfrom DB, checksis_activeget_current_admin→ wrapsget_current_user+role == 'admin'check (raises 403)get_regular_user→ wrapsget_current_user+ rejectsrole == 'admin'(admins get 403 on document endpoints)
Entry Points
Backend:
- Location:
backend/main.py - Triggers:
uvicorn main:app - Responsibilities: FastAPI app factory, lifespan (MinIO bucket init, Redis connection, admin bootstrap), middleware registration in correct order, router inclusion
Celery Worker:
- Location:
backend/celery_app.py(factory) +backend/tasks/ - Triggers:
celery -A celery_app worker -Q documents - Responsibilities: Async document text extraction + classification, email delivery, scheduled nightly audit CSV export
Frontend:
- Location:
frontend/src/main.js - Triggers: Vite dev server (
npm run dev) or built static files served by frontend container - Responsibilities: Mount Vue app with Pinia and Router
Architectural Constraints
- Threading: FastAPI runs on a single-threaded asyncio event loop (uvicorn). Blocking MinIO SDK calls use
asyncio.to_thread(). Celery workers are separate sync processes that bridge to async viaasyncio.run()— they never share an event loop with FastAPI. - Global state:
backend/services/storage.pyholds a module-level_storagesingleton for the default MinIO backend.backend/main.pystores MinIO client onapp.state.minioand Redis client onapp.state.redis. - Circular imports: Celery task modules must never import from
main.pyor router modules.backend/celery_app.pyintentionally avoids importingconfig— readsREDIS_URLdirectly fromos.environto avoid pydantic-settings side effects. - Admin isolation: Admin accounts cannot access document content — enforced by
get_regular_userdep on all document/folder/share endpoints. No impersonation code path exists (backend/deps/auth.py). - Quota atomicity: Quota enforcement uses a single atomic
UPDATE quotas SET used_bytes = used_bytes + $delta WHERE (used_bytes + $delta) <= limit_bytes RETURNING used_bytes— no read-then-write in Python. - Object key privacy: MinIO keys are
{user_id}/{document_id}/{uuid4()}{ext}— original filenames stored only in the DBfilenamecolumn, never in the storage key.
Anti-Patterns
Accessing document content via unauthenticated iframe src
What happens: Setting <iframe src="/api/documents/{id}/content"> directly would bypass Bearer token auth in browsers that do not send cookies cross-origin.
Why it's wrong: The document content endpoint requires Authorization: Bearer header; browser src= attributes do not send custom headers.
Do this instead: Use fetchDocumentContent(docId) in frontend/src/api/client.js — it injects Bearer + handles 401-refresh-retry, then builds an object URL from the Blob response.
Committing inside write_audit_log
What happens: Calling session.commit() inside write_audit_log creates a separate transaction for the audit entry.
Why it's wrong: The audit entry would commit even if the primary operation subsequently fails, creating phantom audit records.
Do this instead: write_audit_log calls session.flush() only. The caller owns session.commit() — backend/services/audit.py.
CloudConnection query without user scope
What happens: Querying CloudConnection without filtering user_id == current_user.id would allow one user's cloud credentials to service another user's request.
Why it's wrong: IDOR — cross-user credential access.
Do this instead: Always filter CloudConnection.user_id == user.id as enforced in get_storage_backend_for_document() in backend/storage/__init__.py.
Error Handling
Strategy: Services raise ValueError; API handlers catch and re-raise as HTTPException. No service module imports FastAPI.
Patterns:
- Auth service raises
ValueError→ API layer maps to 401/422/400 - Storage errors (
S3Error, cloud provider errors) wrapped inbackend/storage/exceptions.py→ 503 or 404 write_audit_lognever raises — silently logs and swallows to protect primary operationsCloudConnectionError(backend/storage/exceptions.py) used for cloud-specific failures
Cross-Cutting Concerns
Logging: Python logging module with logger = logging.getLogger(__name__) in each module. No structured logging framework.
Validation: Pydantic models at API boundary. Field validators on sensitive fields (filename rejects path separators, permission allowlists, non-negative quota). No model accepts **kwargs.
Authentication: Every non-public endpoint injects get_current_user, get_current_admin, or get_regular_user via FastAPI Depends. No endpoint bypasses the dependency chain.
Rate Limiting: slowapi (wraps limits-library) on all auth endpoints. Per-IP limits via @limiter.limit("10/minute"). Per-account Redis counter on login: login_attempts:{email}, 10 attempts per 15-minute window.
Audit Logging: write_audit_log() called inline in API handlers for all auth events, document operations, admin actions, and cloud connections. Written within the handler's transaction via session.flush().
HKDF Credential Encryption: Cloud credentials encrypted with Fernet(HKDF-SHA256(master_key, salt=user_id, purpose="cloud-creds")) before DB storage. Implementation in backend/storage/cloud_utils.py.
Architecture analysis: 2026-06-02