89f8d5a654
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
346 lines
18 KiB
Markdown
346 lines
18 KiB
Markdown
<!-- refreshed: 2026-06-02 -->
|
|
# Codebase Structure
|
|
|
|
**Analysis Date:** 2026-06-02
|
|
|
|
## Directory Layout
|
|
|
|
```
|
|
document_scanner/ # Repo root
|
|
├── backend/ # FastAPI Python backend
|
|
│ ├── main.py # App factory, middleware, router registration
|
|
│ ├── config.py # Pydantic Settings (all env vars)
|
|
│ ├── celery_app.py # Celery factory, task routing, beat schedule
|
|
│ ├── alembic.ini # Alembic migration config
|
|
│ ├── requirements.txt # Pinned Python dependencies
|
|
│ ├── Dockerfile # Backend container image
|
|
│ ├── pytest.ini # pytest config
|
|
│ ├── api/ # HTTP route handlers (thin — no business logic)
|
|
│ │ ├── auth.py # /api/auth/* — register, login, TOTP, refresh
|
|
│ │ ├── documents.py # /api/documents/* — upload, confirm, list, stream
|
|
│ │ ├── folders.py # /api/folders/* — CRUD + document move
|
|
│ │ ├── shares.py # /api/shares/* — share grants and revocation
|
|
│ │ ├── cloud.py # /api/cloud/* + /api/users/me/default-storage
|
|
│ │ ├── admin.py # /api/admin/* — user management, quota, AI config
|
|
│ │ ├── audit.py # /api/admin/audit-log — viewer + CSV export
|
|
│ │ └── topics.py # /api/topics/* — CRUD topics + suggest
|
|
│ ├── services/ # Business logic (no FastAPI coupling)
|
|
│ │ ├── auth.py # Argon2, JWT, refresh tokens, TOTP, HIBP
|
|
│ │ ├── audit.py # write_audit_log() helper
|
|
│ │ ├── classifier.py # AI classification orchestration
|
|
│ │ ├── extractor.py # PDF/DOCX/image/text extraction
|
|
│ │ ├── storage.py # ORM document queries + topic resolution
|
|
│ │ ├── cloud_cache.py # TTL-cached cloud folder listing
|
|
│ │ └── email.py # Email composition helpers
|
|
│ ├── storage/ # Pluggable object storage backends
|
|
│ │ ├── base.py # StorageBackend ABC
|
|
│ │ ├── __init__.py # Factory: get_storage_backend(), get_storage_backend_for_document()
|
|
│ │ ├── minio_backend.py # MinIO/S3 implementation (primary)
|
|
│ │ ├── google_drive_backend.py
|
|
│ │ ├── onedrive_backend.py
|
|
│ │ ├── nextcloud_backend.py
|
|
│ │ ├── webdav_backend.py
|
|
│ │ ├── cloud_utils.py # HKDF encryption/decryption, URL validation
|
|
│ │ └── exceptions.py # CloudConnectionError
|
|
│ ├── ai/ # Pluggable AI classification providers
|
|
│ │ ├── base.py # AIProvider ABC + ClassificationResult dataclass
|
|
│ │ ├── __init__.py # Factory: get_provider()
|
|
│ │ ├── ollama_provider.py
|
|
│ │ ├── openai_provider.py
|
|
│ │ ├── anthropic_provider.py
|
|
│ │ ├── lmstudio_provider.py
|
|
│ │ └── utils.py # Shared AI utilities
|
|
│ ├── db/ # Database layer
|
|
│ │ ├── models.py # SQLAlchemy ORM — 11 tables, all UUID PKs
|
|
│ │ └── session.py # Async engine + AsyncSessionLocal factory
|
|
│ ├── deps/ # FastAPI dependency injection
|
|
│ │ ├── auth.py # get_current_user, get_current_admin, get_regular_user
|
|
│ │ ├── db.py # get_db (per-request AsyncSession)
|
|
│ │ └── utils.py # get_client_ip
|
|
│ ├── tasks/ # Celery async task modules
|
|
│ │ ├── document_tasks.py # extract_and_classify, cleanup_abandoned_uploads
|
|
│ │ ├── email_tasks.py # send_reset_email, send_security_alert_email
|
|
│ │ └── audit_tasks.py # audit_log_daily_export (nightly Celery beat)
|
|
│ ├── migrations/ # Alembic migration scripts
|
|
│ │ ├── versions/
|
|
│ │ │ ├── 0001_initial_schema.py
|
|
│ │ │ ├── 0002_add_backup_codes_and_password_must_change.py
|
|
│ │ │ ├── 0003_multi_user_isolation.py
|
|
│ │ │ └── 0004_phase4_pdf_open_mode_tsvector.py
|
|
│ │ └── env.py # Alembic async migration runner
|
|
│ ├── tests/ # Backend test suite (pytest + httpx)
|
|
│ │ ├── conftest.py # Shared fixtures (async engine, client, users)
|
|
│ │ ├── test_auth_api.py
|
|
│ │ ├── test_documents.py
|
|
│ │ ├── test_folders.py
|
|
│ │ ├── test_shares.py
|
|
│ │ ├── test_cloud.py
|
|
│ │ ├── test_admin_api.py
|
|
│ │ ├── test_audit.py
|
|
│ │ ├── test_quota.py
|
|
│ │ ├── test_security.py
|
|
│ │ └── ... # 28 test files total
|
|
│ └── data/ # Static data files (topic seed data etc.)
|
|
│
|
|
├── frontend/ # Vue 3 SPA
|
|
│ ├── src/
|
|
│ │ ├── main.js # Vue app mount, Pinia + Router registration
|
|
│ │ ├── App.vue # Root component — layout switcher (auth vs app)
|
|
│ │ ├── style.css # Global Tailwind CSS entry
|
|
│ │ ├── api/
|
|
│ │ │ └── client.js # fetch wrapper, Bearer injection, 401→refresh→retry
|
|
│ │ ├── stores/ # Pinia state stores
|
|
│ │ │ ├── auth.js # accessToken (memory), user, quota, refresh
|
|
│ │ │ ├── documents.js # documents list, upload flow, search/sort
|
|
│ │ │ ├── folders.js # folder tree, breadcrumb, rootFolders
|
|
│ │ │ ├── topics.js # topics list CRUD
|
|
│ │ │ └── cloudConnections.js # cloud connection list
|
|
│ │ ├── router/
|
|
│ │ │ └── index.js # Routes + beforeEach auth guard (silent refresh)
|
|
│ │ ├── layouts/
|
|
│ │ │ └── AuthLayout.vue # Centered card layout for login/register pages
|
|
│ │ ├── views/ # Page-level components (one per route)
|
|
│ │ │ ├── FileManagerView.vue # / and /folders/:id — unified file manager
|
|
│ │ │ ├── DocumentView.vue # /document/:id — document detail + preview
|
|
│ │ │ ├── TopicsView.vue # /topics — topic management
|
|
│ │ │ ├── SettingsView.vue # /settings — user settings + TOTP
|
|
│ │ │ ├── AdminView.vue # /admin — admin panel (users, audit log)
|
|
│ │ │ ├── SharedView.vue # /shared — documents shared with me
|
|
│ │ │ ├── CloudStorageView.vue # /cloud — cloud connections overview
|
|
│ │ │ ├── CloudFolderView.vue # /cloud/:provider/:folderId — cloud folder browser
|
|
│ │ │ └── auth/ # Auth flow pages
|
|
│ │ │ ├── LoginView.vue
|
|
│ │ │ ├── RegisterView.vue
|
|
│ │ │ ├── PasswordResetView.vue
|
|
│ │ │ └── NewPasswordView.vue
|
|
│ │ ├── components/ # Reusable UI components
|
|
│ │ │ ├── storage/
|
|
│ │ │ │ └── StorageBrowser.vue # Core file manager widget (local + cloud modes)
|
|
│ │ │ ├── layout/
|
|
│ │ │ │ ├── AppSidebar.vue # Navigation sidebar with folder tree + quota bar
|
|
│ │ │ │ └── QuotaBar.vue # Storage quota progress bar
|
|
│ │ │ ├── documents/
|
|
│ │ │ │ └── DocumentCard.vue # Single document row in file manager
|
|
│ │ │ ├── folders/
|
|
│ │ │ │ ├── FolderTreeItem.vue # Recursive sidebar folder tree node
|
|
│ │ │ │ └── FolderDeleteModal.vue
|
|
│ │ │ ├── cloud/
|
|
│ │ │ │ ├── CloudProviderTreeItem.vue
|
|
│ │ │ │ └── CloudFolderTreeItem.vue
|
|
│ │ │ ├── sharing/
|
|
│ │ │ │ └── ShareModal.vue # Share document with another user
|
|
│ │ │ ├── upload/
|
|
│ │ │ │ └── DropZone.vue # Drag-and-drop file upload zone
|
|
│ │ │ ├── auth/ # Auth form components
|
|
│ │ │ ├── admin/ # Admin panel sub-components
|
|
│ │ │ ├── settings/ # Settings page sub-components
|
|
│ │ │ ├── topics/ # Topic chip/badge components
|
|
│ │ │ └── ui/ # Generic UI primitives (TreeItem.vue, etc.)
|
|
│ │ └── utils/ # Frontend utility functions
|
|
│ ├── index.html # Vite HTML entry
|
|
│ ├── vite.config.js # Vite config (proxy /api → :8000)
|
|
│ ├── tailwind.config.js # Tailwind CSS config
|
|
│ ├── vitest.config.js # Vitest test config
|
|
│ └── package.json # npm dependencies
|
|
│
|
|
├── docker/
|
|
│ └── postgres/
|
|
│ └── initdb.d/ # PostgreSQL init scripts (DB user + role setup)
|
|
│
|
|
├── docker-compose.yml # All services: postgres, minio, redis, backend,
|
|
│ # celery-worker, celery-beat, frontend
|
|
├── .env.example # Documented env var template (safe to commit)
|
|
├── .env # Local secrets (gitignored)
|
|
├── CLAUDE.md # Project instructions for Claude agents
|
|
├── SECURITY.md # Security audit findings and mitigations
|
|
└── .planning/ # GSD workflow planning artifacts
|
|
├── ROADMAP.md
|
|
├── REQUIREMENTS.md
|
|
├── STATE.md
|
|
├── PROJECT.md
|
|
└── codebase/ # Codebase map (this directory)
|
|
```
|
|
|
|
## Directory Purposes
|
|
|
|
**`backend/api/`:**
|
|
- Purpose: HTTP endpoint handlers — thin layer only. No business logic.
|
|
- Contains: One module per resource (`auth.py`, `documents.py`, `folders.py`, etc.)
|
|
- Key files: `backend/api/documents.py` (presigned upload flow), `backend/api/auth.py` (JWT issuance)
|
|
|
|
**`backend/services/`:**
|
|
- Purpose: Business logic decoupled from FastAPI. Functions are pure async Python.
|
|
- Contains: `auth.py` (crypto, TOTP, HIBP), `classifier.py` (AI orchestration), `extractor.py` (text extraction), `storage.py` (ORM queries), `audit.py` (audit log writer), `cloud_cache.py` (TTL cache), `email.py` (email helpers)
|
|
- Rule: No module in `services/` may import from `fastapi` or `api/`
|
|
|
|
**`backend/storage/`:**
|
|
- Purpose: All object storage interaction behind the `StorageBackend` ABC
|
|
- Contains: `base.py` (interface), factory `__init__.py`, one file per backend, `cloud_utils.py` (HKDF encrypt/decrypt), `exceptions.py`
|
|
- Key invariant: `get_storage_backend_for_document()` is the only place cloud credentials are decrypted
|
|
|
|
**`backend/ai/`:**
|
|
- Purpose: AI classification providers behind the `AIProvider` ABC
|
|
- Contains: `base.py` (interface + `ClassificationResult`), factory `__init__.py`, one file per provider
|
|
- Selected per-user via `users.ai_provider` + `users.ai_model` DB columns
|
|
|
|
**`backend/db/`:**
|
|
- Purpose: ORM schema and session management
|
|
- Contains: `models.py` (11 tables, all UUID PKs, full index declarations), `session.py` (async engine, `AsyncSessionLocal`)
|
|
- Note: Two DB users — `docuvault_app` (DML only, used at runtime) and `docuvault_migrate` (DDL, used by Alembic only)
|
|
|
|
**`backend/deps/`:**
|
|
- Purpose: FastAPI `Depends()` callables — shared dependency injection
|
|
- Contains: `get_db` (per-request session), `get_current_user`, `get_current_admin`, `get_regular_user`, `get_client_ip`
|
|
|
|
**`backend/tasks/`:**
|
|
- Purpose: Celery task definitions for async background work
|
|
- Contains: `document_tasks.py` (extraction + classification + cleanup), `email_tasks.py` (password reset + security alerts), `audit_tasks.py` (nightly CSV export)
|
|
|
|
**`backend/migrations/versions/`:**
|
|
- Purpose: Alembic migration history
|
|
- Contains: Sequentially numbered migration scripts (`0001_` → `0004_`)
|
|
- Generated: Manually reviewed, never auto-generated and committed directly
|
|
|
|
**`backend/tests/`:**
|
|
- Purpose: pytest test suite using `httpx.AsyncClient` with real PostgreSQL
|
|
- Contains: 28 test files covering all endpoints, security invariants, and services
|
|
- Key files: `conftest.py` (shared fixtures), `test_security.py` (IDOR, admin block, CSRF tests)
|
|
|
|
**`frontend/src/stores/`:**
|
|
- Purpose: Pinia stores — application state + API calls
|
|
- Contains: `auth.js`, `documents.js`, `folders.js`, `topics.js`, `cloudConnections.js`
|
|
- Rule: Stores are the only place `api/client.js` is called from. Views do not call `api/` directly.
|
|
|
|
**`frontend/src/api/`:**
|
|
- Purpose: Thin HTTP client wrapper
|
|
- Contains: `client.js` — all `fetch()` calls, Bearer header injection, 401→refresh→retry logic, all exported API functions
|
|
- Rule: No business logic here — purely request/response translation
|
|
|
|
**`frontend/src/views/`:**
|
|
- Purpose: Route-level page components
|
|
- Contains: One `.vue` file per route. Views wire stores to components via event delegation.
|
|
- Key file: `FileManagerView.vue` — root view, delegates to `StorageBrowser` component
|
|
|
|
**`frontend/src/components/storage/`:**
|
|
- Purpose: Reusable file manager widget
|
|
- Contains: `StorageBrowser.vue` — unified listing component for local folder mode and cloud folder mode
|
|
|
|
**`frontend/src/components/layout/`:**
|
|
- Purpose: Persistent app shell
|
|
- Contains: `AppSidebar.vue` (navigation, folder tree, cloud links, quota bar), `QuotaBar.vue` (storage progress)
|
|
|
|
## Key File Locations
|
|
|
|
**Entry Points:**
|
|
- `backend/main.py`: FastAPI app — start here for any backend investigation
|
|
- `backend/celery_app.py`: Celery factory — start here for task routing investigation
|
|
- `frontend/src/main.js`: Vue app mount
|
|
- `frontend/src/router/index.js`: All routes + auth guard
|
|
|
|
**Configuration:**
|
|
- `backend/config.py`: All env vars with defaults (Pydantic Settings)
|
|
- `.env.example`: Documented env var template
|
|
- `docker-compose.yml`: Full service topology with env var wiring
|
|
- `frontend/vite.config.js`: Dev proxy config (`/api` → `:8000`)
|
|
|
|
**Core Logic:**
|
|
- `backend/db/models.py`: Full ORM schema — reference for all table structures
|
|
- `backend/services/auth.py`: JWT, Argon2, TOTP, HIBP — all auth primitives
|
|
- `backend/storage/__init__.py`: Storage backend factory — entry point for understanding storage routing
|
|
- `backend/storage/cloud_utils.py`: HKDF credential encryption/decryption
|
|
|
|
**Testing:**
|
|
- `backend/tests/conftest.py`: Test fixtures — DB setup, user creation, auth helpers
|
|
- `backend/tests/test_security.py`: Security invariant tests (IDOR, admin block, CSRF, timing)
|
|
|
|
## Naming Conventions
|
|
|
|
**Backend files:**
|
|
- Modules: `snake_case.py`
|
|
- One module per resource/concern in `api/` (matches the resource noun: `documents.py`, `folders.py`)
|
|
- One module per backend in `storage/` (`{provider}_backend.py`)
|
|
- One module per provider in `ai/` (`{provider}_provider.py`)
|
|
|
|
**Frontend files:**
|
|
- Vue components: `PascalCase.vue`
|
|
- Stores: `camelCase.js` matching the resource noun (`documents.js`, `folders.js`)
|
|
- Views: `{Name}View.vue` pattern
|
|
|
|
**Database:**
|
|
- All tables: `snake_case` plural (`users`, `refresh_tokens`, `cloud_connections`)
|
|
- All PKs: UUID type
|
|
- FKs: `{table_singular}_id` pattern (`user_id`, `folder_id`, `document_id`)
|
|
|
|
## Where to Add New Code
|
|
|
|
**New API endpoint (new resource):**
|
|
- Create `backend/api/{resource}.py` with `APIRouter(prefix="/api/{resource}")`
|
|
- Add service logic to `backend/services/{resource}.py` (or extend existing service)
|
|
- Register router in `backend/main.py` with `app.include_router()`
|
|
- Add corresponding `export function {action}{Resource}()` calls to `frontend/src/api/client.js`
|
|
|
|
**New Vue page (new route):**
|
|
- Create `frontend/src/views/{Name}View.vue`
|
|
- Add route to `frontend/src/router/index.js`
|
|
- If it needs auth: add `meta: { requiresAuth: true }` (or `requiresAdmin: true`)
|
|
|
|
**New Pinia store:**
|
|
- Create `frontend/src/stores/{resource}.js` using Composition API pattern (`defineStore('name', () => { ... })`)
|
|
- Export named: `export const use{Resource}Store`
|
|
|
|
**New storage backend:**
|
|
- Implement `StorageBackend` ABC from `backend/storage/base.py`
|
|
- Create `backend/storage/{provider}_backend.py`
|
|
- Add lazy import branch in `get_storage_backend_for_document()` in `backend/storage/__init__.py`
|
|
|
|
**New AI provider:**
|
|
- Implement `AIProvider` ABC from `backend/ai/base.py`
|
|
- Create `backend/ai/{provider}_provider.py`
|
|
- Register in `backend/ai/__init__.py` factory
|
|
|
|
**New Celery task:**
|
|
- Add task function to appropriate `backend/tasks/*.py` module
|
|
- Decorate with `@celery_app.task(name="tasks.{module}.{task_name}")`
|
|
- If periodic: add to `celery_app.conf.beat_schedule` in `backend/celery_app.py`
|
|
|
|
**New DB table:**
|
|
- Add ORM model class to `backend/db/models.py` extending `Base`
|
|
- Create new Alembic migration: `alembic revision --autogenerate -m "description"`
|
|
- Review and test the generated migration before committing
|
|
|
|
**New tests:**
|
|
- Backend: add `backend/tests/test_{resource}.py`
|
|
- Use fixtures from `backend/tests/conftest.py` (async session, auth client, test users)
|
|
- Security invariant tests belong in `backend/tests/test_security.py`
|
|
|
|
## Special Directories
|
|
|
|
**`.planning/`:**
|
|
- Purpose: GSD workflow planning artifacts (roadmap, requirements, phase plans, codebase maps)
|
|
- Generated: Partially (codebase maps regenerated by mapper agents)
|
|
- Committed: Yes
|
|
|
|
**`backend/data/`:**
|
|
- Purpose: Static data files (topic seed data, fixture CSVs)
|
|
- Generated: No
|
|
- Committed: Yes
|
|
|
|
**`frontend/dist/`:**
|
|
- Purpose: Vite production build output
|
|
- Generated: Yes (`npm run build`)
|
|
- Committed: No (gitignored)
|
|
|
|
**`backend/migrations/versions/`:**
|
|
- Purpose: Alembic migration history — one file per schema change
|
|
- Generated: Via `alembic revision` then manually reviewed
|
|
- Committed: Yes — each migration is a permanent historical artifact
|
|
|
|
**`.claude/worktrees/`:**
|
|
- Purpose: Isolated git worktrees used by Claude Code agent subprocesses
|
|
- Generated: Yes (by `/gsd:execute-phase` and related commands)
|
|
- Committed: No
|
|
|
|
---
|
|
|
|
*Structure analysis: 2026-06-02*
|