# Architecture **Analysis Date:** 2026-06-02 ## System Overview ```text ┌──────────────────────────────────────────────────────────────────────────┐ │ Browser (Vue 3 SPA) │ │ Pinia stores: auth · documents · folders · topics · cloudConnections │ │ Router: / /folders/:id /document/:id /cloud /admin /shared │ └─────────────────────┬──────────────────────────────────┬────────────────┘ │ fetch() + Bearer JWT │ PUT (presigned) ▼ ▼ ┌──────────────────────────────────┐ ┌───────────────────────────────┐ │ FastAPI Backend :8000 │ │ MinIO :9000 │ │ api/auth api/documents │ │ Bucket: docuvault │ │ api/folders api/shares │ │ Keys: {uid}/{did}/{uuid}{e} │ │ api/cloud api/admin │ └───────────────────────────────┘ │ api/audit api/topics │ │ │ ┌───────────────────────────────┐ │ Middleware stack (per request):│ │ Cloud Backends │ │ OriginValidation (first) │ │ Google Drive / OneDrive │ │ CORS │ │ Nextcloud / WebDAV │ │ SecurityHeaders (CSP, etc.) │ └───────────────────────────────┘ │ SlowAPI rate limiter │ │ │ ┌───────────────────────────────┐ │ Deps layer: │ │ Celery Worker │ │ get_db (AsyncSession) │◄────► tasks/document_tasks.py │ │ get_current_user (JWT) │ │ tasks/email_tasks.py │ │ get_current_admin │ │ tasks/audit_tasks.py │ │ get_regular_user │ └───────────────────────────────┘ └────────────┬─────────────────────┘ │ SQLAlchemy async ┌───────────────────────────────┐ ▼ │ Redis :6379 │ ┌──────────────────────────┐ │ Rate limiting (slowapi) │ │ PostgreSQL :5432 │ │ TOTP replay cache │ │ 11 tables: │◄──────────► Celery broker + results │ │ users · quotas │ │ OAuth state tokens (TTL) │ │ refresh_tokens │ └───────────────────────────────┘ │ backup_codes · folders │ │ documents · topics │ ┌───────────────────────────────┐ │ document_topics │ │ AI Providers (pluggable) │ │ shares · audit_log │ │ Ollama · OpenAI · Anthropic │ │ cloud_connections │ │ LMStudio │ │ groups (v2 stub) │ │ ai/base.py → AIProvider ABC │ └──────────────────────────┘ └───────────────────────────────┘ ``` ## Component Responsibilities | Component | Responsibility | Key File | |-----------|----------------|----------| | FastAPI app | ASGI entry point, middleware, router registration | `backend/main.py` | | Auth API | Register, login (TOTP/backup), refresh, logout, password reset | `backend/api/auth.py` | | Documents API | Upload URL, confirm, list, delete, classify, stream content | `backend/api/documents.py` | | Folders API | CRUD folders, move documents between folders | `backend/api/folders.py` | | Shares API | Grant/revoke/list document shares between users | `backend/api/shares.py` | | Cloud API | OAuth flows, WebDAV connect, folder listing, default storage | `backend/api/cloud.py` | | Admin API | User CRUD, quota, AI config, audit log, delete user | `backend/api/admin.py` | | Audit API | Paginated audit log viewer + CSV export | `backend/api/audit.py` | | Topics API | CRUD topics, topic suggestions | `backend/api/topics.py` | | Auth service | Password hashing, JWT, refresh token family, TOTP, HIBP | `backend/services/auth.py` | | Audit service | `write_audit_log()` — flushed within caller's transaction | `backend/services/audit.py` | | Classifier service | Selects AI provider, assigns topics, auto-creates suggestions | `backend/services/classifier.py` | | Extractor service | PDF/DOCX/image/text extraction | `backend/services/extractor.py` | | Storage service | ORM queries for documents + topic resolution | `backend/services/storage.py` | | StorageBackend ABC | Interface for all object storage backends | `backend/storage/base.py` | | Storage factory | Returns MinIOBackend or cloud backend from document record | `backend/storage/__init__.py` | | MinIO backend | Presigned URL, put/get/delete, stat | `backend/storage/minio_backend.py` | | Cloud backends | Google Drive, OneDrive, Nextcloud, WebDAV implementations | `backend/storage/*_backend.py` | | AIProvider ABC | Interface: classify, suggest_topics, health_check | `backend/ai/base.py` | | AI factory | Returns provider instance from string slug | `backend/ai/__init__.py` | | Celery app | Task routing, beat schedule, JSON serialization | `backend/celery_app.py` | | Document task | extract_and_classify — async bridge from sync Celery worker | `backend/tasks/document_tasks.py` | | ORM models | 11-table schema, all UUID PKs, full index set | `backend/db/models.py` | | DB session | Async engine, session factory (expire_on_commit=False) | `backend/db/session.py` | | FastAPI deps | get_db, get_current_user, get_current_admin, get_regular_user | `backend/deps/` | | Auth store | accessToken (memory only), user, quota, refresh deduplication | `frontend/src/stores/auth.js` | | Documents store | CRUD, 3-step MinIO upload with progress, search debounce | `frontend/src/stores/documents.js` | | Folders store | CRUD folders, breadcrumb, rootFolders for sidebar | `frontend/src/stores/folders.js` | | Topics store | CRUD topics | `frontend/src/stores/topics.js` | | CloudConnections store | List/disconnect cloud connections | `frontend/src/stores/cloudConnections.js` | | API client | fetch wrapper, Bearer injection, 401→refresh→retry | `frontend/src/api/client.js` | | Vue Router | SPA routes, beforeEach guard (silent refresh on reload) | `frontend/src/router/index.js` | | FileManagerView | Unified file manager for local folders and documents | `frontend/src/views/FileManagerView.vue` | | StorageBrowser | Reusable file listing component (local + cloud modes) | `frontend/src/components/storage/StorageBrowser.vue` | ## Pattern Overview **Overall:** Layered REST API + SPA with async background processing **Key Characteristics:** - API layer is thin — validation via Pydantic, business logic in `services/` - No ORM relationships loaded — explicit queries only (prevents N+1) - Async everywhere in FastAPI; Celery workers bridge to async via `asyncio.run()` - Frontend Pinia stores own data-fetching; views delegate to stores; components emit events upward - One DB session per request (yielded by `get_db` dep), one per Celery task invocation - All resource ownership checked inline in handlers (`resource.user_id == current_user.id`) ## Layers **API Layer:** - Purpose: HTTP routing, request validation, response serialization - Location: `backend/api/` - Contains: APIRouter instances, Pydantic request/response models, FastAPI dep injection - Depends on: `services/`, `deps/`, `db/models.py` - Used by: Frontend via HTTP; not called from other backend modules **Service Layer:** - Purpose: Business logic with no FastAPI coupling (pure Python async functions) - Location: `backend/services/` - Contains: `auth.py`, `audit.py`, `classifier.py`, `extractor.py`, `storage.py`, `cloud_cache.py`, `email.py` - Depends on: `db/models.py`, `storage/`, `ai/`, `config` - Used by: `api/` layer and Celery tasks **Storage Abstraction Layer:** - Purpose: Backend-agnostic object storage interface - Location: `backend/storage/` - Contains: `base.py` (ABC), `minio_backend.py`, `google_drive_backend.py`, `onedrive_backend.py`, `nextcloud_backend.py`, `webdav_backend.py`, `cloud_utils.py` (HKDF encryption), `exceptions.py` - Depends on: `config`, `db/models.py` (for cloud credential lookup) - Used by: `services/storage.py`, `api/documents.py`, Celery tasks **AI Abstraction Layer:** - Purpose: Pluggable AI provider interface for document classification - Location: `backend/ai/` - Contains: `base.py` (ABC), `ollama_provider.py`, `openai_provider.py`, `anthropic_provider.py`, `lmstudio_provider.py`, `utils.py` - Depends on: External AI APIs via httpx - Used by: `services/classifier.py` **Dependency Layer:** - Purpose: FastAPI reusable dependencies (DI) - Location: `backend/deps/` - Contains: `db.py` (get_db), `auth.py` (get_current_user, get_current_admin, get_regular_user), `utils.py` (get_client_ip) - Used by: All `api/` handlers **Frontend Store Layer:** - Purpose: Application state + async API calls - Location: `frontend/src/stores/` - Contains: `auth.js`, `documents.js`, `folders.js`, `topics.js`, `cloudConnections.js` - Depends on: `api/client.js` - Used by: Views and components ## Data Flow ### Document Upload (MinIO presigned URL path) 1. User drops file in `DropZone` → `StorageBrowser` emits `upload` → `FileManagerView.onFilesSelected` (`frontend/src/views/FileManagerView.vue`) 2. `documentsStore.upload(file, autoClassify, folderId)` (`frontend/src/stores/documents.js`) 3. `POST /api/documents/upload-url` → creates pending `Document` row, returns presigned PUT URL + `document_id` (`backend/api/documents.py`) 4. XHR `PUT` bytes directly from browser to MinIO presigned URL (no backend proxy, no auth header needed — URL is self-authenticating) 5. `POST /api/documents/{id}/confirm` → `stat_object()` for authoritative size → atomic quota `UPDATE … RETURNING` → status set to `'ready'` (`backend/api/documents.py`) 6. If `folderId != null`: `PATCH /api/documents/{id}/folder` → places document in folder 7. Celery task `extract_and_classify.delay(document_id)` enqueued → text extraction → AI classification → topic assignment (`backend/tasks/document_tasks.py`) 8. `authStore.fetchQuota()` called on frontend to refresh sidebar quota bar ### Authentication Flow 1. `POST /api/auth/login` with `{email, password}` — per-account Redis rate limit checked first (`backend/api/auth.py`) 2. Password verified with Argon2 (constant-time via pwdlib) 3. If TOTP enabled and no code provided → returns `{requires_totp: true}` challenge 4. If TOTP code provided → verified against pyotp + Redis replay prevention window 5. On success: `create_access_token()` (HS256 JWT, 15-min TTL) + `create_refresh_token()` (SHA-256 hashed, stored in DB) (`backend/services/auth.py`) 6. Access token returned in JSON body; refresh token set as `httpOnly; Secure; SameSite=Strict` cookie scoped to `/api/auth/refresh` path only 7. Frontend stores access token in `authStore.accessToken` (Pinia `ref()` — memory only, never localStorage) 8. On page reload: router `beforeEach` guard calls `authStore.refresh()` → `POST /api/auth/refresh` sends httpOnly cookie → new access token returned 9. `api/client.js` intercepts any 401 → calls `authStore.refresh()` → retries request once (`frontend/src/api/client.js`) ### Refresh Token Rotation + Family Revocation 1. `POST /api/auth/refresh` reads httpOnly cookie, looks up `RefreshToken` row by SHA-256 hash 2. If token already revoked → all user's refresh tokens revoked → 401 + security alert email enqueued via Celery 3. If valid: old token marked `revoked=True`, new raw token generated and stored (hashed), rotated cookie set ### Cloud Storage OAuth Flow 1. `GET /api/cloud/oauth/initiate/{provider}` → state token stored in Redis (TTL 1800s, single-use) → authorization URL returned 2. Browser navigates to OAuth provider → callback to `GET /api/cloud/oauth/callback/{provider}` 3. State token validated (single-use consumed from Redis), authorization code exchanged for credentials 4. Credentials encrypted with HKDF-derived per-user Fernet key → stored in `cloud_connections.credentials_enc` 5. On document operations: `get_storage_backend_for_document()` decrypts credentials, instantiates cloud backend — transparent to API handlers (`backend/storage/__init__.py`) **State Management (frontend):** - Access token: `authStore.accessToken` — Pinia `ref(null)`, JS memory only, cleared on logout/error - User profile: `authStore.user` — Pinia `ref(null)` - Quota: `authStore.quota` — fetched after upload/delete, displayed in `QuotaBar` - Documents: `documentsStore.documents` — local array, kept in sync via explicit `fetchDocuments()` calls - Folder tree: `foldersStore.rootFolders` (sidebar) + `foldersStore.folders` (current level) - Upload progress: `documentsStore.uploadProgress` — keyed `${filename}__${Date.now()}` to prevent key collision ## Key Abstractions **StorageBackend ABC (`backend/storage/base.py`):** - Purpose: Uniform interface over MinIO and all cloud providers - Methods: `put_object`, `get_object`, `delete_object`, `presigned_get_url`, `health_check`, `generate_presigned_put_url`, `stat_object` - Implementations: `MinIOBackend`, `GoogleDriveBackend`, `OneDriveBackend`, `NextcloudBackend`, `WebDAVBackend` - Selected by: `get_storage_backend_for_document()` in `backend/storage/__init__.py` **AIProvider ABC (`backend/ai/base.py`):** - Purpose: Pluggable classification backend - Methods: `classify`, `suggest_topics`, `health_check` - Returns: `ClassificationResult(topics, suggested_new_topics, reasoning)` - Implementations: `OllamaProvider`, `OpenAIProvider`, `AnthropicProvider`, `LMStudioProvider` - Selected by: `ai/__init__.py` factory, keyed to per-user `ai_provider`/`ai_model` from DB **Dependency Chain:** - `get_current_user` → parses Bearer JWT → loads `User` from DB, checks `is_active` - `get_current_admin` → wraps `get_current_user` + `role == 'admin'` check (raises 403) - `get_regular_user` → wraps `get_current_user` + rejects `role == 'admin'` (admins get 403 on document endpoints) ## Entry Points **Backend:** - Location: `backend/main.py` - Triggers: `uvicorn main:app` - Responsibilities: FastAPI app factory, lifespan (MinIO bucket init, Redis connection, admin bootstrap), middleware registration in correct order, router inclusion **Celery Worker:** - Location: `backend/celery_app.py` (factory) + `backend/tasks/` - Triggers: `celery -A celery_app worker -Q documents` - Responsibilities: Async document text extraction + classification, email delivery, scheduled nightly audit CSV export **Frontend:** - Location: `frontend/src/main.js` - Triggers: Vite dev server (`npm run dev`) or built static files served by frontend container - Responsibilities: Mount Vue app with Pinia and Router ## Architectural Constraints - **Threading:** FastAPI runs on a single-threaded asyncio event loop (uvicorn). Blocking MinIO SDK calls use `asyncio.to_thread()`. Celery workers are separate sync processes that bridge to async via `asyncio.run()` — they never share an event loop with FastAPI. - **Global state:** `backend/services/storage.py` holds a module-level `_storage` singleton for the default MinIO backend. `backend/main.py` stores MinIO client on `app.state.minio` and Redis client on `app.state.redis`. - **Circular imports:** Celery task modules must never import from `main.py` or router modules. `backend/celery_app.py` intentionally avoids importing `config` — reads `REDIS_URL` directly from `os.environ` to avoid pydantic-settings side effects. - **Admin isolation:** Admin accounts cannot access document content — enforced by `get_regular_user` dep on all document/folder/share endpoints. No impersonation code path exists (`backend/deps/auth.py`). - **Quota atomicity:** Quota enforcement uses a single atomic `UPDATE quotas SET used_bytes = used_bytes + $delta WHERE (used_bytes + $delta) <= limit_bytes RETURNING used_bytes` — no read-then-write in Python. - **Object key privacy:** MinIO keys are `{user_id}/{document_id}/{uuid4()}{ext}` — original filenames stored only in the DB `filename` column, never in the storage key. ## Anti-Patterns ### Accessing document content via unauthenticated iframe src **What happens:** Setting `