# Codebase Structure **Analysis Date:** 2026-06-02 ## Directory Layout ``` document_scanner/ # Repo root ├── backend/ # FastAPI Python backend │ ├── main.py # App factory, middleware, router registration │ ├── config.py # Pydantic Settings (all env vars) │ ├── celery_app.py # Celery factory, task routing, beat schedule │ ├── alembic.ini # Alembic migration config │ ├── requirements.txt # Pinned Python dependencies │ ├── Dockerfile # Backend container image │ ├── pytest.ini # pytest config │ ├── api/ # HTTP route handlers (thin — no business logic) │ │ ├── auth.py # /api/auth/* — register, login, TOTP, refresh │ │ ├── documents.py # /api/documents/* — upload, confirm, list, stream │ │ ├── folders.py # /api/folders/* — CRUD + document move │ │ ├── shares.py # /api/shares/* — share grants and revocation │ │ ├── cloud.py # /api/cloud/* + /api/users/me/default-storage │ │ ├── admin.py # /api/admin/* — user management, quota, AI config │ │ ├── audit.py # /api/admin/audit-log — viewer + CSV export │ │ └── topics.py # /api/topics/* — CRUD topics + suggest │ ├── services/ # Business logic (no FastAPI coupling) │ │ ├── auth.py # Argon2, JWT, refresh tokens, TOTP, HIBP │ │ ├── audit.py # write_audit_log() helper │ │ ├── classifier.py # AI classification orchestration │ │ ├── extractor.py # PDF/DOCX/image/text extraction │ │ ├── storage.py # ORM document queries + topic resolution │ │ ├── cloud_cache.py # TTL-cached cloud folder listing │ │ └── email.py # Email composition helpers │ ├── storage/ # Pluggable object storage backends │ │ ├── base.py # StorageBackend ABC │ │ ├── __init__.py # Factory: get_storage_backend(), get_storage_backend_for_document() │ │ ├── minio_backend.py # MinIO/S3 implementation (primary) │ │ ├── google_drive_backend.py │ │ ├── onedrive_backend.py │ │ ├── nextcloud_backend.py │ │ ├── webdav_backend.py │ │ ├── cloud_utils.py # HKDF encryption/decryption, URL validation │ │ └── exceptions.py # CloudConnectionError │ ├── ai/ # Pluggable AI classification providers │ │ ├── base.py # AIProvider ABC + ClassificationResult dataclass │ │ ├── __init__.py # Factory: get_provider() │ │ ├── ollama_provider.py │ │ ├── openai_provider.py │ │ ├── anthropic_provider.py │ │ ├── lmstudio_provider.py │ │ └── utils.py # Shared AI utilities │ ├── db/ # Database layer │ │ ├── models.py # SQLAlchemy ORM — 11 tables, all UUID PKs │ │ └── session.py # Async engine + AsyncSessionLocal factory │ ├── deps/ # FastAPI dependency injection │ │ ├── auth.py # get_current_user, get_current_admin, get_regular_user │ │ ├── db.py # get_db (per-request AsyncSession) │ │ └── utils.py # get_client_ip │ ├── tasks/ # Celery async task modules │ │ ├── document_tasks.py # extract_and_classify, cleanup_abandoned_uploads │ │ ├── email_tasks.py # send_reset_email, send_security_alert_email │ │ └── audit_tasks.py # audit_log_daily_export (nightly Celery beat) │ ├── migrations/ # Alembic migration scripts │ │ ├── versions/ │ │ │ ├── 0001_initial_schema.py │ │ │ ├── 0002_add_backup_codes_and_password_must_change.py │ │ │ ├── 0003_multi_user_isolation.py │ │ │ └── 0004_phase4_pdf_open_mode_tsvector.py │ │ └── env.py # Alembic async migration runner │ ├── tests/ # Backend test suite (pytest + httpx) │ │ ├── conftest.py # Shared fixtures (async engine, client, users) │ │ ├── test_auth_api.py │ │ ├── test_documents.py │ │ ├── test_folders.py │ │ ├── test_shares.py │ │ ├── test_cloud.py │ │ ├── test_admin_api.py │ │ ├── test_audit.py │ │ ├── test_quota.py │ │ ├── test_security.py │ │ └── ... # 28 test files total │ └── data/ # Static data files (topic seed data etc.) │ ├── frontend/ # Vue 3 SPA │ ├── src/ │ │ ├── main.js # Vue app mount, Pinia + Router registration │ │ ├── App.vue # Root component — layout switcher (auth vs app) │ │ ├── style.css # Global Tailwind CSS entry │ │ ├── api/ │ │ │ └── client.js # fetch wrapper, Bearer injection, 401→refresh→retry │ │ ├── stores/ # Pinia state stores │ │ │ ├── auth.js # accessToken (memory), user, quota, refresh │ │ │ ├── documents.js # documents list, upload flow, search/sort │ │ │ ├── folders.js # folder tree, breadcrumb, rootFolders │ │ │ ├── topics.js # topics list CRUD │ │ │ └── cloudConnections.js # cloud connection list │ │ ├── router/ │ │ │ └── index.js # Routes + beforeEach auth guard (silent refresh) │ │ ├── layouts/ │ │ │ └── AuthLayout.vue # Centered card layout for login/register pages │ │ ├── views/ # Page-level components (one per route) │ │ │ ├── FileManagerView.vue # / and /folders/:id — unified file manager │ │ │ ├── DocumentView.vue # /document/:id — document detail + preview │ │ │ ├── TopicsView.vue # /topics — topic management │ │ │ ├── SettingsView.vue # /settings — user settings + TOTP │ │ │ ├── AdminView.vue # /admin — admin panel (users, audit log) │ │ │ ├── SharedView.vue # /shared — documents shared with me │ │ │ ├── CloudStorageView.vue # /cloud — cloud connections overview │ │ │ ├── CloudFolderView.vue # /cloud/:provider/:folderId — cloud folder browser │ │ │ └── auth/ # Auth flow pages │ │ │ ├── LoginView.vue │ │ │ ├── RegisterView.vue │ │ │ ├── PasswordResetView.vue │ │ │ └── NewPasswordView.vue │ │ ├── components/ # Reusable UI components │ │ │ ├── storage/ │ │ │ │ └── StorageBrowser.vue # Core file manager widget (local + cloud modes) │ │ │ ├── layout/ │ │ │ │ ├── AppSidebar.vue # Navigation sidebar with folder tree + quota bar │ │ │ │ └── QuotaBar.vue # Storage quota progress bar │ │ │ ├── documents/ │ │ │ │ └── DocumentCard.vue # Single document row in file manager │ │ │ ├── folders/ │ │ │ │ ├── FolderTreeItem.vue # Recursive sidebar folder tree node │ │ │ │ └── FolderDeleteModal.vue │ │ │ ├── cloud/ │ │ │ │ ├── CloudProviderTreeItem.vue │ │ │ │ └── CloudFolderTreeItem.vue │ │ │ ├── sharing/ │ │ │ │ └── ShareModal.vue # Share document with another user │ │ │ ├── upload/ │ │ │ │ └── DropZone.vue # Drag-and-drop file upload zone │ │ │ ├── auth/ # Auth form components │ │ │ ├── admin/ # Admin panel sub-components │ │ │ ├── settings/ # Settings page sub-components │ │ │ ├── topics/ # Topic chip/badge components │ │ │ └── ui/ # Generic UI primitives (TreeItem.vue, etc.) │ │ └── utils/ # Frontend utility functions │ ├── index.html # Vite HTML entry │ ├── vite.config.js # Vite config (proxy /api → :8000) │ ├── tailwind.config.js # Tailwind CSS config │ ├── vitest.config.js # Vitest test config │ └── package.json # npm dependencies │ ├── docker/ │ └── postgres/ │ └── initdb.d/ # PostgreSQL init scripts (DB user + role setup) │ ├── docker-compose.yml # All services: postgres, minio, redis, backend, │ # celery-worker, celery-beat, frontend ├── .env.example # Documented env var template (safe to commit) ├── .env # Local secrets (gitignored) ├── CLAUDE.md # Project instructions for Claude agents ├── SECURITY.md # Security audit findings and mitigations └── .planning/ # GSD workflow planning artifacts ├── ROADMAP.md ├── REQUIREMENTS.md ├── STATE.md ├── PROJECT.md └── codebase/ # Codebase map (this directory) ``` ## Directory Purposes **`backend/api/`:** - Purpose: HTTP endpoint handlers — thin layer only. No business logic. - Contains: One module per resource (`auth.py`, `documents.py`, `folders.py`, etc.) - Key files: `backend/api/documents.py` (presigned upload flow), `backend/api/auth.py` (JWT issuance) **`backend/services/`:** - Purpose: Business logic decoupled from FastAPI. Functions are pure async Python. - Contains: `auth.py` (crypto, TOTP, HIBP), `classifier.py` (AI orchestration), `extractor.py` (text extraction), `storage.py` (ORM queries), `audit.py` (audit log writer), `cloud_cache.py` (TTL cache), `email.py` (email helpers) - Rule: No module in `services/` may import from `fastapi` or `api/` **`backend/storage/`:** - Purpose: All object storage interaction behind the `StorageBackend` ABC - Contains: `base.py` (interface), factory `__init__.py`, one file per backend, `cloud_utils.py` (HKDF encrypt/decrypt), `exceptions.py` - Key invariant: `get_storage_backend_for_document()` is the only place cloud credentials are decrypted **`backend/ai/`:** - Purpose: AI classification providers behind the `AIProvider` ABC - Contains: `base.py` (interface + `ClassificationResult`), factory `__init__.py`, one file per provider - Selected per-user via `users.ai_provider` + `users.ai_model` DB columns **`backend/db/`:** - Purpose: ORM schema and session management - Contains: `models.py` (11 tables, all UUID PKs, full index declarations), `session.py` (async engine, `AsyncSessionLocal`) - Note: Two DB users — `docuvault_app` (DML only, used at runtime) and `docuvault_migrate` (DDL, used by Alembic only) **`backend/deps/`:** - Purpose: FastAPI `Depends()` callables — shared dependency injection - Contains: `get_db` (per-request session), `get_current_user`, `get_current_admin`, `get_regular_user`, `get_client_ip` **`backend/tasks/`:** - Purpose: Celery task definitions for async background work - Contains: `document_tasks.py` (extraction + classification + cleanup), `email_tasks.py` (password reset + security alerts), `audit_tasks.py` (nightly CSV export) **`backend/migrations/versions/`:** - Purpose: Alembic migration history - Contains: Sequentially numbered migration scripts (`0001_` → `0004_`) - Generated: Manually reviewed, never auto-generated and committed directly **`backend/tests/`:** - Purpose: pytest test suite using `httpx.AsyncClient` with real PostgreSQL - Contains: 28 test files covering all endpoints, security invariants, and services - Key files: `conftest.py` (shared fixtures), `test_security.py` (IDOR, admin block, CSRF tests) **`frontend/src/stores/`:** - Purpose: Pinia stores — application state + API calls - Contains: `auth.js`, `documents.js`, `folders.js`, `topics.js`, `cloudConnections.js` - Rule: Stores are the only place `api/client.js` is called from. Views do not call `api/` directly. **`frontend/src/api/`:** - Purpose: Thin HTTP client wrapper - Contains: `client.js` — all `fetch()` calls, Bearer header injection, 401→refresh→retry logic, all exported API functions - Rule: No business logic here — purely request/response translation **`frontend/src/views/`:** - Purpose: Route-level page components - Contains: One `.vue` file per route. Views wire stores to components via event delegation. - Key file: `FileManagerView.vue` — root view, delegates to `StorageBrowser` component **`frontend/src/components/storage/`:** - Purpose: Reusable file manager widget - Contains: `StorageBrowser.vue` — unified listing component for local folder mode and cloud folder mode **`frontend/src/components/layout/`:** - Purpose: Persistent app shell - Contains: `AppSidebar.vue` (navigation, folder tree, cloud links, quota bar), `QuotaBar.vue` (storage progress) ## Key File Locations **Entry Points:** - `backend/main.py`: FastAPI app — start here for any backend investigation - `backend/celery_app.py`: Celery factory — start here for task routing investigation - `frontend/src/main.js`: Vue app mount - `frontend/src/router/index.js`: All routes + auth guard **Configuration:** - `backend/config.py`: All env vars with defaults (Pydantic Settings) - `.env.example`: Documented env var template - `docker-compose.yml`: Full service topology with env var wiring - `frontend/vite.config.js`: Dev proxy config (`/api` → `:8000`) **Core Logic:** - `backend/db/models.py`: Full ORM schema — reference for all table structures - `backend/services/auth.py`: JWT, Argon2, TOTP, HIBP — all auth primitives - `backend/storage/__init__.py`: Storage backend factory — entry point for understanding storage routing - `backend/storage/cloud_utils.py`: HKDF credential encryption/decryption **Testing:** - `backend/tests/conftest.py`: Test fixtures — DB setup, user creation, auth helpers - `backend/tests/test_security.py`: Security invariant tests (IDOR, admin block, CSRF, timing) ## Naming Conventions **Backend files:** - Modules: `snake_case.py` - One module per resource/concern in `api/` (matches the resource noun: `documents.py`, `folders.py`) - One module per backend in `storage/` (`{provider}_backend.py`) - One module per provider in `ai/` (`{provider}_provider.py`) **Frontend files:** - Vue components: `PascalCase.vue` - Stores: `camelCase.js` matching the resource noun (`documents.js`, `folders.js`) - Views: `{Name}View.vue` pattern **Database:** - All tables: `snake_case` plural (`users`, `refresh_tokens`, `cloud_connections`) - All PKs: UUID type - FKs: `{table_singular}_id` pattern (`user_id`, `folder_id`, `document_id`) ## Where to Add New Code **New API endpoint (new resource):** - Create `backend/api/{resource}.py` with `APIRouter(prefix="/api/{resource}")` - Add service logic to `backend/services/{resource}.py` (or extend existing service) - Register router in `backend/main.py` with `app.include_router()` - Add corresponding `export function {action}{Resource}()` calls to `frontend/src/api/client.js` **New Vue page (new route):** - Create `frontend/src/views/{Name}View.vue` - Add route to `frontend/src/router/index.js` - If it needs auth: add `meta: { requiresAuth: true }` (or `requiresAdmin: true`) **New Pinia store:** - Create `frontend/src/stores/{resource}.js` using Composition API pattern (`defineStore('name', () => { ... })`) - Export named: `export const use{Resource}Store` **New storage backend:** - Implement `StorageBackend` ABC from `backend/storage/base.py` - Create `backend/storage/{provider}_backend.py` - Add lazy import branch in `get_storage_backend_for_document()` in `backend/storage/__init__.py` **New AI provider:** - Implement `AIProvider` ABC from `backend/ai/base.py` - Create `backend/ai/{provider}_provider.py` - Register in `backend/ai/__init__.py` factory **New Celery task:** - Add task function to appropriate `backend/tasks/*.py` module - Decorate with `@celery_app.task(name="tasks.{module}.{task_name}")` - If periodic: add to `celery_app.conf.beat_schedule` in `backend/celery_app.py` **New DB table:** - Add ORM model class to `backend/db/models.py` extending `Base` - Create new Alembic migration: `alembic revision --autogenerate -m "description"` - Review and test the generated migration before committing **New tests:** - Backend: add `backend/tests/test_{resource}.py` - Use fixtures from `backend/tests/conftest.py` (async session, auth client, test users) - Security invariant tests belong in `backend/tests/test_security.py` ## Special Directories **`.planning/`:** - Purpose: GSD workflow planning artifacts (roadmap, requirements, phase plans, codebase maps) - Generated: Partially (codebase maps regenerated by mapper agents) - Committed: Yes **`backend/data/`:** - Purpose: Static data files (topic seed data, fixture CSVs) - Generated: No - Committed: Yes **`frontend/dist/`:** - Purpose: Vite production build output - Generated: Yes (`npm run build`) - Committed: No (gitignored) **`backend/migrations/versions/`:** - Purpose: Alembic migration history — one file per schema change - Generated: Via `alembic revision` then manually reviewed - Committed: Yes — each migration is a permanent historical artifact **`.claude/worktrees/`:** - Purpose: Isolated git worktrees used by Claude Code agent subprocesses - Generated: Yes (by `/gsd:execute-phase` and related commands) - Committed: No --- *Structure analysis: 2026-06-02*