89f8d5a654
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
18 KiB
18 KiB
Codebase Structure
Analysis Date: 2026-06-02
Directory Layout
document_scanner/ # Repo root
├── backend/ # FastAPI Python backend
│ ├── main.py # App factory, middleware, router registration
│ ├── config.py # Pydantic Settings (all env vars)
│ ├── celery_app.py # Celery factory, task routing, beat schedule
│ ├── alembic.ini # Alembic migration config
│ ├── requirements.txt # Pinned Python dependencies
│ ├── Dockerfile # Backend container image
│ ├── pytest.ini # pytest config
│ ├── api/ # HTTP route handlers (thin — no business logic)
│ │ ├── auth.py # /api/auth/* — register, login, TOTP, refresh
│ │ ├── documents.py # /api/documents/* — upload, confirm, list, stream
│ │ ├── folders.py # /api/folders/* — CRUD + document move
│ │ ├── shares.py # /api/shares/* — share grants and revocation
│ │ ├── cloud.py # /api/cloud/* + /api/users/me/default-storage
│ │ ├── admin.py # /api/admin/* — user management, quota, AI config
│ │ ├── audit.py # /api/admin/audit-log — viewer + CSV export
│ │ └── topics.py # /api/topics/* — CRUD topics + suggest
│ ├── services/ # Business logic (no FastAPI coupling)
│ │ ├── auth.py # Argon2, JWT, refresh tokens, TOTP, HIBP
│ │ ├── audit.py # write_audit_log() helper
│ │ ├── classifier.py # AI classification orchestration
│ │ ├── extractor.py # PDF/DOCX/image/text extraction
│ │ ├── storage.py # ORM document queries + topic resolution
│ │ ├── cloud_cache.py # TTL-cached cloud folder listing
│ │ └── email.py # Email composition helpers
│ ├── storage/ # Pluggable object storage backends
│ │ ├── base.py # StorageBackend ABC
│ │ ├── __init__.py # Factory: get_storage_backend(), get_storage_backend_for_document()
│ │ ├── minio_backend.py # MinIO/S3 implementation (primary)
│ │ ├── google_drive_backend.py
│ │ ├── onedrive_backend.py
│ │ ├── nextcloud_backend.py
│ │ ├── webdav_backend.py
│ │ ├── cloud_utils.py # HKDF encryption/decryption, URL validation
│ │ └── exceptions.py # CloudConnectionError
│ ├── ai/ # Pluggable AI classification providers
│ │ ├── base.py # AIProvider ABC + ClassificationResult dataclass
│ │ ├── __init__.py # Factory: get_provider()
│ │ ├── ollama_provider.py
│ │ ├── openai_provider.py
│ │ ├── anthropic_provider.py
│ │ ├── lmstudio_provider.py
│ │ └── utils.py # Shared AI utilities
│ ├── db/ # Database layer
│ │ ├── models.py # SQLAlchemy ORM — 11 tables, all UUID PKs
│ │ └── session.py # Async engine + AsyncSessionLocal factory
│ ├── deps/ # FastAPI dependency injection
│ │ ├── auth.py # get_current_user, get_current_admin, get_regular_user
│ │ ├── db.py # get_db (per-request AsyncSession)
│ │ └── utils.py # get_client_ip
│ ├── tasks/ # Celery async task modules
│ │ ├── document_tasks.py # extract_and_classify, cleanup_abandoned_uploads
│ │ ├── email_tasks.py # send_reset_email, send_security_alert_email
│ │ └── audit_tasks.py # audit_log_daily_export (nightly Celery beat)
│ ├── migrations/ # Alembic migration scripts
│ │ ├── versions/
│ │ │ ├── 0001_initial_schema.py
│ │ │ ├── 0002_add_backup_codes_and_password_must_change.py
│ │ │ ├── 0003_multi_user_isolation.py
│ │ │ └── 0004_phase4_pdf_open_mode_tsvector.py
│ │ └── env.py # Alembic async migration runner
│ ├── tests/ # Backend test suite (pytest + httpx)
│ │ ├── conftest.py # Shared fixtures (async engine, client, users)
│ │ ├── test_auth_api.py
│ │ ├── test_documents.py
│ │ ├── test_folders.py
│ │ ├── test_shares.py
│ │ ├── test_cloud.py
│ │ ├── test_admin_api.py
│ │ ├── test_audit.py
│ │ ├── test_quota.py
│ │ ├── test_security.py
│ │ └── ... # 28 test files total
│ └── data/ # Static data files (topic seed data etc.)
│
├── frontend/ # Vue 3 SPA
│ ├── src/
│ │ ├── main.js # Vue app mount, Pinia + Router registration
│ │ ├── App.vue # Root component — layout switcher (auth vs app)
│ │ ├── style.css # Global Tailwind CSS entry
│ │ ├── api/
│ │ │ └── client.js # fetch wrapper, Bearer injection, 401→refresh→retry
│ │ ├── stores/ # Pinia state stores
│ │ │ ├── auth.js # accessToken (memory), user, quota, refresh
│ │ │ ├── documents.js # documents list, upload flow, search/sort
│ │ │ ├── folders.js # folder tree, breadcrumb, rootFolders
│ │ │ ├── topics.js # topics list CRUD
│ │ │ └── cloudConnections.js # cloud connection list
│ │ ├── router/
│ │ │ └── index.js # Routes + beforeEach auth guard (silent refresh)
│ │ ├── layouts/
│ │ │ └── AuthLayout.vue # Centered card layout for login/register pages
│ │ ├── views/ # Page-level components (one per route)
│ │ │ ├── FileManagerView.vue # / and /folders/:id — unified file manager
│ │ │ ├── DocumentView.vue # /document/:id — document detail + preview
│ │ │ ├── TopicsView.vue # /topics — topic management
│ │ │ ├── SettingsView.vue # /settings — user settings + TOTP
│ │ │ ├── AdminView.vue # /admin — admin panel (users, audit log)
│ │ │ ├── SharedView.vue # /shared — documents shared with me
│ │ │ ├── CloudStorageView.vue # /cloud — cloud connections overview
│ │ │ ├── CloudFolderView.vue # /cloud/:provider/:folderId — cloud folder browser
│ │ │ └── auth/ # Auth flow pages
│ │ │ ├── LoginView.vue
│ │ │ ├── RegisterView.vue
│ │ │ ├── PasswordResetView.vue
│ │ │ └── NewPasswordView.vue
│ │ ├── components/ # Reusable UI components
│ │ │ ├── storage/
│ │ │ │ └── StorageBrowser.vue # Core file manager widget (local + cloud modes)
│ │ │ ├── layout/
│ │ │ │ ├── AppSidebar.vue # Navigation sidebar with folder tree + quota bar
│ │ │ │ └── QuotaBar.vue # Storage quota progress bar
│ │ │ ├── documents/
│ │ │ │ └── DocumentCard.vue # Single document row in file manager
│ │ │ ├── folders/
│ │ │ │ ├── FolderTreeItem.vue # Recursive sidebar folder tree node
│ │ │ │ └── FolderDeleteModal.vue
│ │ │ ├── cloud/
│ │ │ │ ├── CloudProviderTreeItem.vue
│ │ │ │ └── CloudFolderTreeItem.vue
│ │ │ ├── sharing/
│ │ │ │ └── ShareModal.vue # Share document with another user
│ │ │ ├── upload/
│ │ │ │ └── DropZone.vue # Drag-and-drop file upload zone
│ │ │ ├── auth/ # Auth form components
│ │ │ ├── admin/ # Admin panel sub-components
│ │ │ ├── settings/ # Settings page sub-components
│ │ │ ├── topics/ # Topic chip/badge components
│ │ │ └── ui/ # Generic UI primitives (TreeItem.vue, etc.)
│ │ └── utils/ # Frontend utility functions
│ ├── index.html # Vite HTML entry
│ ├── vite.config.js # Vite config (proxy /api → :8000)
│ ├── tailwind.config.js # Tailwind CSS config
│ ├── vitest.config.js # Vitest test config
│ └── package.json # npm dependencies
│
├── docker/
│ └── postgres/
│ └── initdb.d/ # PostgreSQL init scripts (DB user + role setup)
│
├── docker-compose.yml # All services: postgres, minio, redis, backend,
│ # celery-worker, celery-beat, frontend
├── .env.example # Documented env var template (safe to commit)
├── .env # Local secrets (gitignored)
├── CLAUDE.md # Project instructions for Claude agents
├── SECURITY.md # Security audit findings and mitigations
└── .planning/ # GSD workflow planning artifacts
├── ROADMAP.md
├── REQUIREMENTS.md
├── STATE.md
├── PROJECT.md
└── codebase/ # Codebase map (this directory)
Directory Purposes
backend/api/:
- Purpose: HTTP endpoint handlers — thin layer only. No business logic.
- Contains: One module per resource (
auth.py,documents.py,folders.py, etc.) - Key files:
backend/api/documents.py(presigned upload flow),backend/api/auth.py(JWT issuance)
backend/services/:
- Purpose: Business logic decoupled from FastAPI. Functions are pure async Python.
- Contains:
auth.py(crypto, TOTP, HIBP),classifier.py(AI orchestration),extractor.py(text extraction),storage.py(ORM queries),audit.py(audit log writer),cloud_cache.py(TTL cache),email.py(email helpers) - Rule: No module in
services/may import fromfastapiorapi/
backend/storage/:
- Purpose: All object storage interaction behind the
StorageBackendABC - Contains:
base.py(interface), factory__init__.py, one file per backend,cloud_utils.py(HKDF encrypt/decrypt),exceptions.py - Key invariant:
get_storage_backend_for_document()is the only place cloud credentials are decrypted
backend/ai/:
- Purpose: AI classification providers behind the
AIProviderABC - Contains:
base.py(interface +ClassificationResult), factory__init__.py, one file per provider - Selected per-user via
users.ai_provider+users.ai_modelDB columns
backend/db/:
- Purpose: ORM schema and session management
- Contains:
models.py(11 tables, all UUID PKs, full index declarations),session.py(async engine,AsyncSessionLocal) - Note: Two DB users —
docuvault_app(DML only, used at runtime) anddocuvault_migrate(DDL, used by Alembic only)
backend/deps/:
- Purpose: FastAPI
Depends()callables — shared dependency injection - Contains:
get_db(per-request session),get_current_user,get_current_admin,get_regular_user,get_client_ip
backend/tasks/:
- Purpose: Celery task definitions for async background work
- Contains:
document_tasks.py(extraction + classification + cleanup),email_tasks.py(password reset + security alerts),audit_tasks.py(nightly CSV export)
backend/migrations/versions/:
- Purpose: Alembic migration history
- Contains: Sequentially numbered migration scripts (
0001_→0004_) - Generated: Manually reviewed, never auto-generated and committed directly
backend/tests/:
- Purpose: pytest test suite using
httpx.AsyncClientwith real PostgreSQL - Contains: 28 test files covering all endpoints, security invariants, and services
- Key files:
conftest.py(shared fixtures),test_security.py(IDOR, admin block, CSRF tests)
frontend/src/stores/:
- Purpose: Pinia stores — application state + API calls
- Contains:
auth.js,documents.js,folders.js,topics.js,cloudConnections.js - Rule: Stores are the only place
api/client.jsis called from. Views do not callapi/directly.
frontend/src/api/:
- Purpose: Thin HTTP client wrapper
- Contains:
client.js— allfetch()calls, Bearer header injection, 401→refresh→retry logic, all exported API functions - Rule: No business logic here — purely request/response translation
frontend/src/views/:
- Purpose: Route-level page components
- Contains: One
.vuefile per route. Views wire stores to components via event delegation. - Key file:
FileManagerView.vue— root view, delegates toStorageBrowsercomponent
frontend/src/components/storage/:
- Purpose: Reusable file manager widget
- Contains:
StorageBrowser.vue— unified listing component for local folder mode and cloud folder mode
frontend/src/components/layout/:
- Purpose: Persistent app shell
- Contains:
AppSidebar.vue(navigation, folder tree, cloud links, quota bar),QuotaBar.vue(storage progress)
Key File Locations
Entry Points:
backend/main.py: FastAPI app — start here for any backend investigationbackend/celery_app.py: Celery factory — start here for task routing investigationfrontend/src/main.js: Vue app mountfrontend/src/router/index.js: All routes + auth guard
Configuration:
backend/config.py: All env vars with defaults (Pydantic Settings).env.example: Documented env var templatedocker-compose.yml: Full service topology with env var wiringfrontend/vite.config.js: Dev proxy config (/api→:8000)
Core Logic:
backend/db/models.py: Full ORM schema — reference for all table structuresbackend/services/auth.py: JWT, Argon2, TOTP, HIBP — all auth primitivesbackend/storage/__init__.py: Storage backend factory — entry point for understanding storage routingbackend/storage/cloud_utils.py: HKDF credential encryption/decryption
Testing:
backend/tests/conftest.py: Test fixtures — DB setup, user creation, auth helpersbackend/tests/test_security.py: Security invariant tests (IDOR, admin block, CSRF, timing)
Naming Conventions
Backend files:
- Modules:
snake_case.py - One module per resource/concern in
api/(matches the resource noun:documents.py,folders.py) - One module per backend in
storage/({provider}_backend.py) - One module per provider in
ai/({provider}_provider.py)
Frontend files:
- Vue components:
PascalCase.vue - Stores:
camelCase.jsmatching the resource noun (documents.js,folders.js) - Views:
{Name}View.vuepattern
Database:
- All tables:
snake_caseplural (users,refresh_tokens,cloud_connections) - All PKs: UUID type
- FKs:
{table_singular}_idpattern (user_id,folder_id,document_id)
Where to Add New Code
New API endpoint (new resource):
- Create
backend/api/{resource}.pywithAPIRouter(prefix="/api/{resource}") - Add service logic to
backend/services/{resource}.py(or extend existing service) - Register router in
backend/main.pywithapp.include_router() - Add corresponding
export function {action}{Resource}()calls tofrontend/src/api/client.js
New Vue page (new route):
- Create
frontend/src/views/{Name}View.vue - Add route to
frontend/src/router/index.js - If it needs auth: add
meta: { requiresAuth: true }(orrequiresAdmin: true)
New Pinia store:
- Create
frontend/src/stores/{resource}.jsusing Composition API pattern (defineStore('name', () => { ... })) - Export named:
export const use{Resource}Store
New storage backend:
- Implement
StorageBackendABC frombackend/storage/base.py - Create
backend/storage/{provider}_backend.py - Add lazy import branch in
get_storage_backend_for_document()inbackend/storage/__init__.py
New AI provider:
- Implement
AIProviderABC frombackend/ai/base.py - Create
backend/ai/{provider}_provider.py - Register in
backend/ai/__init__.pyfactory
New Celery task:
- Add task function to appropriate
backend/tasks/*.pymodule - Decorate with
@celery_app.task(name="tasks.{module}.{task_name}") - If periodic: add to
celery_app.conf.beat_scheduleinbackend/celery_app.py
New DB table:
- Add ORM model class to
backend/db/models.pyextendingBase - Create new Alembic migration:
alembic revision --autogenerate -m "description" - Review and test the generated migration before committing
New tests:
- Backend: add
backend/tests/test_{resource}.py - Use fixtures from
backend/tests/conftest.py(async session, auth client, test users) - Security invariant tests belong in
backend/tests/test_security.py
Special Directories
.planning/:
- Purpose: GSD workflow planning artifacts (roadmap, requirements, phase plans, codebase maps)
- Generated: Partially (codebase maps regenerated by mapper agents)
- Committed: Yes
backend/data/:
- Purpose: Static data files (topic seed data, fixture CSVs)
- Generated: No
- Committed: Yes
frontend/dist/:
- Purpose: Vite production build output
- Generated: Yes (
npm run build) - Committed: No (gitignored)
backend/migrations/versions/:
- Purpose: Alembic migration history — one file per schema change
- Generated: Via
alembic revisionthen manually reviewed - Committed: Yes — each migration is a permanent historical artifact
.claude/worktrees/:
- Purpose: Isolated git worktrees used by Claude Code agent subprocesses
- Generated: Yes (by
/gsd:execute-phaseand related commands) - Committed: No
Structure analysis: 2026-06-02