docs(codebase): refresh codebase map after Phase 06.2 completion

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
curo1305
2026-06-02 15:32:06 +02:00
parent bd17b4b22f
commit 89f8d5a654
7 changed files with 1829 additions and 621 deletions
+324 -123
View File
@@ -1,144 +1,345 @@
# STRUCTURE — document-scanner
<!-- refreshed: 2026-06-02 -->
# Codebase Structure
_Last updated: 2026-05-21_
**Analysis Date:** 2026-06-02
## Summary
The project is a monorepo with two top-level service directories (`backend/`, `frontend/`) and Docker Compose at the root. Backend is a Python/FastAPI app; frontend is a Vue 3 SPA built with Vite. All persistent data lives under `backend/data/`.
---
## Top-Level Layout
## Directory Layout
```
document_scanner/
├── backend/ Python FastAPI service
├── frontend/ Vue 3 SPA
├── docker-compose.yml Two-service compose (backend + frontend)
├── .env.example Optional env vars (API keys)
└── .claude/ Claude Code settings
document_scanner/ # Repo root
├── backend/ # FastAPI Python backend
│ ├── main.py # App factory, middleware, router registration
│ ├── config.py # Pydantic Settings (all env vars)
│ ├── celery_app.py # Celery factory, task routing, beat schedule
│ ├── alembic.ini # Alembic migration config
│ ├── requirements.txt # Pinned Python dependencies
│ ├── Dockerfile # Backend container image
│ ├── pytest.ini # pytest config
│ ├── api/ # HTTP route handlers (thin — no business logic)
│ │ ├── auth.py # /api/auth/* — register, login, TOTP, refresh
│ │ ├── documents.py # /api/documents/* — upload, confirm, list, stream
│ │ ├── folders.py # /api/folders/* — CRUD + document move
│ │ ├── shares.py # /api/shares/* — share grants and revocation
│ │ ├── cloud.py # /api/cloud/* + /api/users/me/default-storage
│ │ ├── admin.py # /api/admin/* — user management, quota, AI config
│ │ ├── audit.py # /api/admin/audit-log — viewer + CSV export
│ │ └── topics.py # /api/topics/* — CRUD topics + suggest
│ ├── services/ # Business logic (no FastAPI coupling)
│ │ ├── auth.py # Argon2, JWT, refresh tokens, TOTP, HIBP
│ │ ├── audit.py # write_audit_log() helper
│ │ ├── classifier.py # AI classification orchestration
│ │ ├── extractor.py # PDF/DOCX/image/text extraction
│ │ ├── storage.py # ORM document queries + topic resolution
│ │ ├── cloud_cache.py # TTL-cached cloud folder listing
│ │ └── email.py # Email composition helpers
│ ├── storage/ # Pluggable object storage backends
│ │ ├── base.py # StorageBackend ABC
│ │ ├── __init__.py # Factory: get_storage_backend(), get_storage_backend_for_document()
│ │ ├── minio_backend.py # MinIO/S3 implementation (primary)
│ │ ├── google_drive_backend.py
│ │ ├── onedrive_backend.py
│ │ ├── nextcloud_backend.py
│ │ ├── webdav_backend.py
│ │ ├── cloud_utils.py # HKDF encryption/decryption, URL validation
│ │ └── exceptions.py # CloudConnectionError
│ ├── ai/ # Pluggable AI classification providers
│ │ ├── base.py # AIProvider ABC + ClassificationResult dataclass
│ │ ├── __init__.py # Factory: get_provider()
│ │ ├── ollama_provider.py
│ │ ├── openai_provider.py
│ │ ├── anthropic_provider.py
│ │ ├── lmstudio_provider.py
│ │ └── utils.py # Shared AI utilities
│ ├── db/ # Database layer
│ │ ├── models.py # SQLAlchemy ORM — 11 tables, all UUID PKs
│ │ └── session.py # Async engine + AsyncSessionLocal factory
│ ├── deps/ # FastAPI dependency injection
│ │ ├── auth.py # get_current_user, get_current_admin, get_regular_user
│ │ ├── db.py # get_db (per-request AsyncSession)
│ │ └── utils.py # get_client_ip
│ ├── tasks/ # Celery async task modules
│ │ ├── document_tasks.py # extract_and_classify, cleanup_abandoned_uploads
│ │ ├── email_tasks.py # send_reset_email, send_security_alert_email
│ │ └── audit_tasks.py # audit_log_daily_export (nightly Celery beat)
│ ├── migrations/ # Alembic migration scripts
│ │ ├── versions/
│ │ │ ├── 0001_initial_schema.py
│ │ │ ├── 0002_add_backup_codes_and_password_must_change.py
│ │ │ ├── 0003_multi_user_isolation.py
│ │ │ └── 0004_phase4_pdf_open_mode_tsvector.py
│ │ └── env.py # Alembic async migration runner
│ ├── tests/ # Backend test suite (pytest + httpx)
│ │ ├── conftest.py # Shared fixtures (async engine, client, users)
│ │ ├── test_auth_api.py
│ │ ├── test_documents.py
│ │ ├── test_folders.py
│ │ ├── test_shares.py
│ │ ├── test_cloud.py
│ │ ├── test_admin_api.py
│ │ ├── test_audit.py
│ │ ├── test_quota.py
│ │ ├── test_security.py
│ │ └── ... # 28 test files total
│ └── data/ # Static data files (topic seed data etc.)
├── frontend/ # Vue 3 SPA
│ ├── src/
│ │ ├── main.js # Vue app mount, Pinia + Router registration
│ │ ├── App.vue # Root component — layout switcher (auth vs app)
│ │ ├── style.css # Global Tailwind CSS entry
│ │ ├── api/
│ │ │ └── client.js # fetch wrapper, Bearer injection, 401→refresh→retry
│ │ ├── stores/ # Pinia state stores
│ │ │ ├── auth.js # accessToken (memory), user, quota, refresh
│ │ │ ├── documents.js # documents list, upload flow, search/sort
│ │ │ ├── folders.js # folder tree, breadcrumb, rootFolders
│ │ │ ├── topics.js # topics list CRUD
│ │ │ └── cloudConnections.js # cloud connection list
│ │ ├── router/
│ │ │ └── index.js # Routes + beforeEach auth guard (silent refresh)
│ │ ├── layouts/
│ │ │ └── AuthLayout.vue # Centered card layout for login/register pages
│ │ ├── views/ # Page-level components (one per route)
│ │ │ ├── FileManagerView.vue # / and /folders/:id — unified file manager
│ │ │ ├── DocumentView.vue # /document/:id — document detail + preview
│ │ │ ├── TopicsView.vue # /topics — topic management
│ │ │ ├── SettingsView.vue # /settings — user settings + TOTP
│ │ │ ├── AdminView.vue # /admin — admin panel (users, audit log)
│ │ │ ├── SharedView.vue # /shared — documents shared with me
│ │ │ ├── CloudStorageView.vue # /cloud — cloud connections overview
│ │ │ ├── CloudFolderView.vue # /cloud/:provider/:folderId — cloud folder browser
│ │ │ └── auth/ # Auth flow pages
│ │ │ ├── LoginView.vue
│ │ │ ├── RegisterView.vue
│ │ │ ├── PasswordResetView.vue
│ │ │ └── NewPasswordView.vue
│ │ ├── components/ # Reusable UI components
│ │ │ ├── storage/
│ │ │ │ └── StorageBrowser.vue # Core file manager widget (local + cloud modes)
│ │ │ ├── layout/
│ │ │ │ ├── AppSidebar.vue # Navigation sidebar with folder tree + quota bar
│ │ │ │ └── QuotaBar.vue # Storage quota progress bar
│ │ │ ├── documents/
│ │ │ │ └── DocumentCard.vue # Single document row in file manager
│ │ │ ├── folders/
│ │ │ │ ├── FolderTreeItem.vue # Recursive sidebar folder tree node
│ │ │ │ └── FolderDeleteModal.vue
│ │ │ ├── cloud/
│ │ │ │ ├── CloudProviderTreeItem.vue
│ │ │ │ └── CloudFolderTreeItem.vue
│ │ │ ├── sharing/
│ │ │ │ └── ShareModal.vue # Share document with another user
│ │ │ ├── upload/
│ │ │ │ └── DropZone.vue # Drag-and-drop file upload zone
│ │ │ ├── auth/ # Auth form components
│ │ │ ├── admin/ # Admin panel sub-components
│ │ │ ├── settings/ # Settings page sub-components
│ │ │ ├── topics/ # Topic chip/badge components
│ │ │ └── ui/ # Generic UI primitives (TreeItem.vue, etc.)
│ │ └── utils/ # Frontend utility functions
│ ├── index.html # Vite HTML entry
│ ├── vite.config.js # Vite config (proxy /api → :8000)
│ ├── tailwind.config.js # Tailwind CSS config
│ ├── vitest.config.js # Vitest test config
│ └── package.json # npm dependencies
├── docker/
│ └── postgres/
│ └── initdb.d/ # PostgreSQL init scripts (DB user + role setup)
├── docker-compose.yml # All services: postgres, minio, redis, backend,
│ # celery-worker, celery-beat, frontend
├── .env.example # Documented env var template (safe to commit)
├── .env # Local secrets (gitignored)
├── CLAUDE.md # Project instructions for Claude agents
├── SECURITY.md # Security audit findings and mitigations
└── .planning/ # GSD workflow planning artifacts
├── ROADMAP.md
├── REQUIREMENTS.md
├── STATE.md
├── PROJECT.md
└── codebase/ # Codebase map (this directory)
```
---
## Directory Purposes
## Backend
**`backend/api/`:**
- Purpose: HTTP endpoint handlers — thin layer only. No business logic.
- Contains: One module per resource (`auth.py`, `documents.py`, `folders.py`, etc.)
- Key files: `backend/api/documents.py` (presigned upload flow), `backend/api/auth.py` (JWT issuance)
```
backend/
├── main.py FastAPI app: CORS, lifespan, router registration
├── config.py Path constants, DEFAULT_SETTINGS, ensure_data_dirs()
├── requirements.txt Python dependencies
├── pytest.ini pytest config (asyncio_mode=auto)
├── Dockerfile
├── api/ FastAPI routers (thin HTTP layer)
│ ├── documents.py Upload, list, get, delete, reclassify endpoints
│ ├── topics.py Topic CRUD endpoints
│ └── settings.py AI provider settings endpoints
├── ai/ AI provider abstraction
│ ├── base.py AIProvider ABC + ClassificationResult dataclass
│ ├── __init__.py get_provider() factory
│ ├── anthropic_provider.py
│ ├── openai_provider.py
│ ├── ollama_provider.py extends OpenAIProvider
│ └── lmstudio_provider.py extends OpenAIProvider
├── services/ Business logic (no FastAPI dependency)
│ ├── extractor.py Text extraction: PDF/DOCX/image/text dispatch
│ ├── classifier.py Orchestrates AI call + topic auto-creation
│ └── storage.py Flat-file JSON CRUD + filelock
├── data/ Runtime data (volume-mounted in Docker)
│ ├── uploads/ Uploaded document files
│ ├── metadata/ Per-document JSON metadata files
│ ├── topics.json Global topic list
│ └── settings.json Active AI provider + system prompt config
└── tests/
├── conftest.py Fixtures: isolated tmp data dir, TestClient, sample files
├── test_health.py
├── test_documents.py
├── test_topics.py
├── test_settings.py
├── test_extractor.py
├── test_classifier.py
└── test_lmstudio.py
```
**`backend/services/`:**
- Purpose: Business logic decoupled from FastAPI. Functions are pure async Python.
- Contains: `auth.py` (crypto, TOTP, HIBP), `classifier.py` (AI orchestration), `extractor.py` (text extraction), `storage.py` (ORM queries), `audit.py` (audit log writer), `cloud_cache.py` (TTL cache), `email.py` (email helpers)
- Rule: No module in `services/` may import from `fastapi` or `api/`
---
**`backend/storage/`:**
- Purpose: All object storage interaction behind the `StorageBackend` ABC
- Contains: `base.py` (interface), factory `__init__.py`, one file per backend, `cloud_utils.py` (HKDF encrypt/decrypt), `exceptions.py`
- Key invariant: `get_storage_backend_for_document()` is the only place cloud credentials are decrypted
## Frontend
**`backend/ai/`:**
- Purpose: AI classification providers behind the `AIProvider` ABC
- Contains: `base.py` (interface + `ClassificationResult`), factory `__init__.py`, one file per provider
- Selected per-user via `users.ai_provider` + `users.ai_model` DB columns
```
frontend/
├── index.html Vite entry HTML
├── vite.config.js Vite config (Vue plugin, /api proxy)
├── tailwind.config.js
├── postcss.config.js
├── package.json Vue 3, Vue Router 4, Pinia; no test framework
├── Dockerfile
└── src/
├── main.js App bootstrap: Vue + Pinia + Router
├── App.vue Root component (sidebar layout wrapper)
├── style.css Global Tailwind imports
├── api/
│ └── client.js fetch wrapper; all API calls go through here
├── stores/ Pinia stores (data + actions layer)
│ ├── documents.js Document list, upload, classify state
│ ├── topics.js Topic list CRUD state
│ └── settings.js AI provider settings state
├── router/
│ └── index.js Routes: /, /topics, /topics/:name, /document/:id, /settings
├── views/ Page-level components (one per route)
│ ├── HomeView.vue
│ ├── TopicsView.vue
│ ├── DocumentView.vue
│ └── SettingsView.vue
└── components/ Reusable UI components
├── layout/
│ └── AppSidebar.vue
├── documents/
│ └── DocumentCard.vue
├── topics/
│ ├── TopicBadge.vue
│ └── TopicManager.vue
└── upload/
├── DropZone.vue
└── UploadProgress.vue
```
**`backend/db/`:**
- Purpose: ORM schema and session management
- Contains: `models.py` (11 tables, all UUID PKs, full index declarations), `session.py` (async engine, `AsyncSessionLocal`)
- Note: Two DB users — `docuvault_app` (DML only, used at runtime) and `docuvault_migrate` (DDL, used by Alembic only)
---
**`backend/deps/`:**
- Purpose: FastAPI `Depends()` callables — shared dependency injection
- Contains: `get_db` (per-request session), `get_current_user`, `get_current_admin`, `get_regular_user`, `get_client_ip`
## Key Entry Points
**`backend/tasks/`:**
- Purpose: Celery task definitions for async background work
- Contains: `document_tasks.py` (extraction + classification + cleanup), `email_tasks.py` (password reset + security alerts), `audit_tasks.py` (nightly CSV export)
| File | Purpose |
|---|---|
| `backend/main.py` | FastAPI app instantiation, middleware, router registration |
| `backend/config.py` | All path constants and default settings — change storage paths here |
| `backend/ai/__init__.py` | Add a new AI provider here |
| `frontend/src/main.js` | Vue app bootstrap |
| `frontend/src/api/client.js` | All HTTP calls originate here |
**`backend/migrations/versions/`:**
- Purpose: Alembic migration history
- Contains: Sequentially numbered migration scripts (`0001_``0004_`)
- Generated: Manually reviewed, never auto-generated and committed directly
---
**`backend/tests/`:**
- Purpose: pytest test suite using `httpx.AsyncClient` with real PostgreSQL
- Contains: 28 test files covering all endpoints, security invariants, and services
- Key files: `conftest.py` (shared fixtures), `test_security.py` (IDOR, admin block, CSRF tests)
**`frontend/src/stores/`:**
- Purpose: Pinia stores — application state + API calls
- Contains: `auth.js`, `documents.js`, `folders.js`, `topics.js`, `cloudConnections.js`
- Rule: Stores are the only place `api/client.js` is called from. Views do not call `api/` directly.
**`frontend/src/api/`:**
- Purpose: Thin HTTP client wrapper
- Contains: `client.js` — all `fetch()` calls, Bearer header injection, 401→refresh→retry logic, all exported API functions
- Rule: No business logic here — purely request/response translation
**`frontend/src/views/`:**
- Purpose: Route-level page components
- Contains: One `.vue` file per route. Views wire stores to components via event delegation.
- Key file: `FileManagerView.vue` — root view, delegates to `StorageBrowser` component
**`frontend/src/components/storage/`:**
- Purpose: Reusable file manager widget
- Contains: `StorageBrowser.vue` — unified listing component for local folder mode and cloud folder mode
**`frontend/src/components/layout/`:**
- Purpose: Persistent app shell
- Contains: `AppSidebar.vue` (navigation, folder tree, cloud links, quota bar), `QuotaBar.vue` (storage progress)
## Key File Locations
**Entry Points:**
- `backend/main.py`: FastAPI app — start here for any backend investigation
- `backend/celery_app.py`: Celery factory — start here for task routing investigation
- `frontend/src/main.js`: Vue app mount
- `frontend/src/router/index.js`: All routes + auth guard
**Configuration:**
- `backend/config.py`: All env vars with defaults (Pydantic Settings)
- `.env.example`: Documented env var template
- `docker-compose.yml`: Full service topology with env var wiring
- `frontend/vite.config.js`: Dev proxy config (`/api``:8000`)
**Core Logic:**
- `backend/db/models.py`: Full ORM schema — reference for all table structures
- `backend/services/auth.py`: JWT, Argon2, TOTP, HIBP — all auth primitives
- `backend/storage/__init__.py`: Storage backend factory — entry point for understanding storage routing
- `backend/storage/cloud_utils.py`: HKDF credential encryption/decryption
**Testing:**
- `backend/tests/conftest.py`: Test fixtures — DB setup, user creation, auth helpers
- `backend/tests/test_security.py`: Security invariant tests (IDOR, admin block, CSRF, timing)
## Naming Conventions
**Backend files:**
- Modules: `snake_case.py`
- One module per resource/concern in `api/` (matches the resource noun: `documents.py`, `folders.py`)
- One module per backend in `storage/` (`{provider}_backend.py`)
- One module per provider in `ai/` (`{provider}_provider.py`)
**Frontend files:**
- Vue components: `PascalCase.vue`
- Stores: `camelCase.js` matching the resource noun (`documents.js`, `folders.js`)
- Views: `{Name}View.vue` pattern
**Database:**
- All tables: `snake_case` plural (`users`, `refresh_tokens`, `cloud_connections`)
- All PKs: UUID type
- FKs: `{table_singular}_id` pattern (`user_id`, `folder_id`, `document_id`)
## Where to Add New Code
- **New API endpoint**: add router in `backend/api/`, register in `backend/main.py`
- **New AI provider**: implement `AIProvider` ABC in `backend/ai/`, add case in `get_provider()`
- **New document type**: add extraction branch in `backend/services/extractor.py`
- **New frontend page**: add view in `src/views/`, add route in `src/router/index.js`
- **New shared UI component**: add to relevant `src/components/<category>/` subdirectory
**New API endpoint (new resource):**
- Create `backend/api/{resource}.py` with `APIRouter(prefix="/api/{resource}")`
- Add service logic to `backend/services/{resource}.py` (or extend existing service)
- Register router in `backend/main.py` with `app.include_router()`
- Add corresponding `export function {action}{Resource}()` calls to `frontend/src/api/client.js`
**New Vue page (new route):**
- Create `frontend/src/views/{Name}View.vue`
- Add route to `frontend/src/router/index.js`
- If it needs auth: add `meta: { requiresAuth: true }` (or `requiresAdmin: true`)
**New Pinia store:**
- Create `frontend/src/stores/{resource}.js` using Composition API pattern (`defineStore('name', () => { ... })`)
- Export named: `export const use{Resource}Store`
**New storage backend:**
- Implement `StorageBackend` ABC from `backend/storage/base.py`
- Create `backend/storage/{provider}_backend.py`
- Add lazy import branch in `get_storage_backend_for_document()` in `backend/storage/__init__.py`
**New AI provider:**
- Implement `AIProvider` ABC from `backend/ai/base.py`
- Create `backend/ai/{provider}_provider.py`
- Register in `backend/ai/__init__.py` factory
**New Celery task:**
- Add task function to appropriate `backend/tasks/*.py` module
- Decorate with `@celery_app.task(name="tasks.{module}.{task_name}")`
- If periodic: add to `celery_app.conf.beat_schedule` in `backend/celery_app.py`
**New DB table:**
- Add ORM model class to `backend/db/models.py` extending `Base`
- Create new Alembic migration: `alembic revision --autogenerate -m "description"`
- Review and test the generated migration before committing
**New tests:**
- Backend: add `backend/tests/test_{resource}.py`
- Use fixtures from `backend/tests/conftest.py` (async session, auth client, test users)
- Security invariant tests belong in `backend/tests/test_security.py`
## Special Directories
**`.planning/`:**
- Purpose: GSD workflow planning artifacts (roadmap, requirements, phase plans, codebase maps)
- Generated: Partially (codebase maps regenerated by mapper agents)
- Committed: Yes
**`backend/data/`:**
- Purpose: Static data files (topic seed data, fixture CSVs)
- Generated: No
- Committed: Yes
**`frontend/dist/`:**
- Purpose: Vite production build output
- Generated: Yes (`npm run build`)
- Committed: No (gitignored)
**`backend/migrations/versions/`:**
- Purpose: Alembic migration history — one file per schema change
- Generated: Via `alembic revision` then manually reviewed
- Committed: Yes — each migration is a permanent historical artifact
**`.claude/worktrees/`:**
- Purpose: Isolated git worktrees used by Claude Code agent subprocesses
- Generated: Yes (by `/gsd:execute-phase` and related commands)
- Committed: No
---
## Gaps / Unknowns
- No `src/components/settings/` subdirectory — settings UI is entirely in `SettingsView.vue`
- No migration or schema versioning for `topics.json` / `settings.json` flat files
*Structure analysis: 2026-06-02*