Files
2026-06-02 15:32:06 +02:00

18 KiB

Codebase Structure

Analysis Date: 2026-06-02

Directory Layout

document_scanner/               # Repo root
├── backend/                    # FastAPI Python backend
│   ├── main.py                 # App factory, middleware, router registration
│   ├── config.py               # Pydantic Settings (all env vars)
│   ├── celery_app.py           # Celery factory, task routing, beat schedule
│   ├── alembic.ini             # Alembic migration config
│   ├── requirements.txt        # Pinned Python dependencies
│   ├── Dockerfile              # Backend container image
│   ├── pytest.ini              # pytest config
│   ├── api/                    # HTTP route handlers (thin — no business logic)
│   │   ├── auth.py             # /api/auth/* — register, login, TOTP, refresh
│   │   ├── documents.py        # /api/documents/* — upload, confirm, list, stream
│   │   ├── folders.py          # /api/folders/* — CRUD + document move
│   │   ├── shares.py           # /api/shares/* — share grants and revocation
│   │   ├── cloud.py            # /api/cloud/* + /api/users/me/default-storage
│   │   ├── admin.py            # /api/admin/* — user management, quota, AI config
│   │   ├── audit.py            # /api/admin/audit-log — viewer + CSV export
│   │   └── topics.py           # /api/topics/* — CRUD topics + suggest
│   ├── services/               # Business logic (no FastAPI coupling)
│   │   ├── auth.py             # Argon2, JWT, refresh tokens, TOTP, HIBP
│   │   ├── audit.py            # write_audit_log() helper
│   │   ├── classifier.py       # AI classification orchestration
│   │   ├── extractor.py        # PDF/DOCX/image/text extraction
│   │   ├── storage.py          # ORM document queries + topic resolution
│   │   ├── cloud_cache.py      # TTL-cached cloud folder listing
│   │   └── email.py            # Email composition helpers
│   ├── storage/                # Pluggable object storage backends
│   │   ├── base.py             # StorageBackend ABC
│   │   ├── __init__.py         # Factory: get_storage_backend(), get_storage_backend_for_document()
│   │   ├── minio_backend.py    # MinIO/S3 implementation (primary)
│   │   ├── google_drive_backend.py
│   │   ├── onedrive_backend.py
│   │   ├── nextcloud_backend.py
│   │   ├── webdav_backend.py
│   │   ├── cloud_utils.py      # HKDF encryption/decryption, URL validation
│   │   └── exceptions.py       # CloudConnectionError
│   ├── ai/                     # Pluggable AI classification providers
│   │   ├── base.py             # AIProvider ABC + ClassificationResult dataclass
│   │   ├── __init__.py         # Factory: get_provider()
│   │   ├── ollama_provider.py
│   │   ├── openai_provider.py
│   │   ├── anthropic_provider.py
│   │   ├── lmstudio_provider.py
│   │   └── utils.py            # Shared AI utilities
│   ├── db/                     # Database layer
│   │   ├── models.py           # SQLAlchemy ORM — 11 tables, all UUID PKs
│   │   └── session.py          # Async engine + AsyncSessionLocal factory
│   ├── deps/                   # FastAPI dependency injection
│   │   ├── auth.py             # get_current_user, get_current_admin, get_regular_user
│   │   ├── db.py               # get_db (per-request AsyncSession)
│   │   └── utils.py            # get_client_ip
│   ├── tasks/                  # Celery async task modules
│   │   ├── document_tasks.py   # extract_and_classify, cleanup_abandoned_uploads
│   │   ├── email_tasks.py      # send_reset_email, send_security_alert_email
│   │   └── audit_tasks.py      # audit_log_daily_export (nightly Celery beat)
│   ├── migrations/             # Alembic migration scripts
│   │   ├── versions/
│   │   │   ├── 0001_initial_schema.py
│   │   │   ├── 0002_add_backup_codes_and_password_must_change.py
│   │   │   ├── 0003_multi_user_isolation.py
│   │   │   └── 0004_phase4_pdf_open_mode_tsvector.py
│   │   └── env.py              # Alembic async migration runner
│   ├── tests/                  # Backend test suite (pytest + httpx)
│   │   ├── conftest.py         # Shared fixtures (async engine, client, users)
│   │   ├── test_auth_api.py
│   │   ├── test_documents.py
│   │   ├── test_folders.py
│   │   ├── test_shares.py
│   │   ├── test_cloud.py
│   │   ├── test_admin_api.py
│   │   ├── test_audit.py
│   │   ├── test_quota.py
│   │   ├── test_security.py
│   │   └── ...                 # 28 test files total
│   └── data/                   # Static data files (topic seed data etc.)
│
├── frontend/                   # Vue 3 SPA
│   ├── src/
│   │   ├── main.js             # Vue app mount, Pinia + Router registration
│   │   ├── App.vue             # Root component — layout switcher (auth vs app)
│   │   ├── style.css           # Global Tailwind CSS entry
│   │   ├── api/
│   │   │   └── client.js       # fetch wrapper, Bearer injection, 401→refresh→retry
│   │   ├── stores/             # Pinia state stores
│   │   │   ├── auth.js         # accessToken (memory), user, quota, refresh
│   │   │   ├── documents.js    # documents list, upload flow, search/sort
│   │   │   ├── folders.js      # folder tree, breadcrumb, rootFolders
│   │   │   ├── topics.js       # topics list CRUD
│   │   │   └── cloudConnections.js  # cloud connection list
│   │   ├── router/
│   │   │   └── index.js        # Routes + beforeEach auth guard (silent refresh)
│   │   ├── layouts/
│   │   │   └── AuthLayout.vue  # Centered card layout for login/register pages
│   │   ├── views/              # Page-level components (one per route)
│   │   │   ├── FileManagerView.vue      # / and /folders/:id — unified file manager
│   │   │   ├── DocumentView.vue         # /document/:id — document detail + preview
│   │   │   ├── TopicsView.vue           # /topics — topic management
│   │   │   ├── SettingsView.vue         # /settings — user settings + TOTP
│   │   │   ├── AdminView.vue            # /admin — admin panel (users, audit log)
│   │   │   ├── SharedView.vue           # /shared — documents shared with me
│   │   │   ├── CloudStorageView.vue     # /cloud — cloud connections overview
│   │   │   ├── CloudFolderView.vue      # /cloud/:provider/:folderId — cloud folder browser
│   │   │   └── auth/                   # Auth flow pages
│   │   │       ├── LoginView.vue
│   │   │       ├── RegisterView.vue
│   │   │       ├── PasswordResetView.vue
│   │   │       └── NewPasswordView.vue
│   │   ├── components/         # Reusable UI components
│   │   │   ├── storage/
│   │   │   │   └── StorageBrowser.vue  # Core file manager widget (local + cloud modes)
│   │   │   ├── layout/
│   │   │   │   ├── AppSidebar.vue      # Navigation sidebar with folder tree + quota bar
│   │   │   │   └── QuotaBar.vue        # Storage quota progress bar
│   │   │   ├── documents/
│   │   │   │   └── DocumentCard.vue    # Single document row in file manager
│   │   │   ├── folders/
│   │   │   │   ├── FolderTreeItem.vue  # Recursive sidebar folder tree node
│   │   │   │   └── FolderDeleteModal.vue
│   │   │   ├── cloud/
│   │   │   │   ├── CloudProviderTreeItem.vue
│   │   │   │   └── CloudFolderTreeItem.vue
│   │   │   ├── sharing/
│   │   │   │   └── ShareModal.vue      # Share document with another user
│   │   │   ├── upload/
│   │   │   │   └── DropZone.vue        # Drag-and-drop file upload zone
│   │   │   ├── auth/                   # Auth form components
│   │   │   ├── admin/                  # Admin panel sub-components
│   │   │   ├── settings/               # Settings page sub-components
│   │   │   ├── topics/                 # Topic chip/badge components
│   │   │   └── ui/                     # Generic UI primitives (TreeItem.vue, etc.)
│   │   └── utils/                      # Frontend utility functions
│   ├── index.html              # Vite HTML entry
│   ├── vite.config.js          # Vite config (proxy /api → :8000)
│   ├── tailwind.config.js      # Tailwind CSS config
│   ├── vitest.config.js        # Vitest test config
│   └── package.json            # npm dependencies
│
├── docker/
│   └── postgres/
│       └── initdb.d/           # PostgreSQL init scripts (DB user + role setup)
│
├── docker-compose.yml          # All services: postgres, minio, redis, backend,
│                               #   celery-worker, celery-beat, frontend
├── .env.example                # Documented env var template (safe to commit)
├── .env                        # Local secrets (gitignored)
├── CLAUDE.md                   # Project instructions for Claude agents
├── SECURITY.md                 # Security audit findings and mitigations
└── .planning/                  # GSD workflow planning artifacts
    ├── ROADMAP.md
    ├── REQUIREMENTS.md
    ├── STATE.md
    ├── PROJECT.md
    └── codebase/               # Codebase map (this directory)

Directory Purposes

backend/api/:

  • Purpose: HTTP endpoint handlers — thin layer only. No business logic.
  • Contains: One module per resource (auth.py, documents.py, folders.py, etc.)
  • Key files: backend/api/documents.py (presigned upload flow), backend/api/auth.py (JWT issuance)

backend/services/:

  • Purpose: Business logic decoupled from FastAPI. Functions are pure async Python.
  • Contains: auth.py (crypto, TOTP, HIBP), classifier.py (AI orchestration), extractor.py (text extraction), storage.py (ORM queries), audit.py (audit log writer), cloud_cache.py (TTL cache), email.py (email helpers)
  • Rule: No module in services/ may import from fastapi or api/

backend/storage/:

  • Purpose: All object storage interaction behind the StorageBackend ABC
  • Contains: base.py (interface), factory __init__.py, one file per backend, cloud_utils.py (HKDF encrypt/decrypt), exceptions.py
  • Key invariant: get_storage_backend_for_document() is the only place cloud credentials are decrypted

backend/ai/:

  • Purpose: AI classification providers behind the AIProvider ABC
  • Contains: base.py (interface + ClassificationResult), factory __init__.py, one file per provider
  • Selected per-user via users.ai_provider + users.ai_model DB columns

backend/db/:

  • Purpose: ORM schema and session management
  • Contains: models.py (11 tables, all UUID PKs, full index declarations), session.py (async engine, AsyncSessionLocal)
  • Note: Two DB users — docuvault_app (DML only, used at runtime) and docuvault_migrate (DDL, used by Alembic only)

backend/deps/:

  • Purpose: FastAPI Depends() callables — shared dependency injection
  • Contains: get_db (per-request session), get_current_user, get_current_admin, get_regular_user, get_client_ip

backend/tasks/:

  • Purpose: Celery task definitions for async background work
  • Contains: document_tasks.py (extraction + classification + cleanup), email_tasks.py (password reset + security alerts), audit_tasks.py (nightly CSV export)

backend/migrations/versions/:

  • Purpose: Alembic migration history
  • Contains: Sequentially numbered migration scripts (0001_0004_)
  • Generated: Manually reviewed, never auto-generated and committed directly

backend/tests/:

  • Purpose: pytest test suite using httpx.AsyncClient with real PostgreSQL
  • Contains: 28 test files covering all endpoints, security invariants, and services
  • Key files: conftest.py (shared fixtures), test_security.py (IDOR, admin block, CSRF tests)

frontend/src/stores/:

  • Purpose: Pinia stores — application state + API calls
  • Contains: auth.js, documents.js, folders.js, topics.js, cloudConnections.js
  • Rule: Stores are the only place api/client.js is called from. Views do not call api/ directly.

frontend/src/api/:

  • Purpose: Thin HTTP client wrapper
  • Contains: client.js — all fetch() calls, Bearer header injection, 401→refresh→retry logic, all exported API functions
  • Rule: No business logic here — purely request/response translation

frontend/src/views/:

  • Purpose: Route-level page components
  • Contains: One .vue file per route. Views wire stores to components via event delegation.
  • Key file: FileManagerView.vue — root view, delegates to StorageBrowser component

frontend/src/components/storage/:

  • Purpose: Reusable file manager widget
  • Contains: StorageBrowser.vue — unified listing component for local folder mode and cloud folder mode

frontend/src/components/layout/:

  • Purpose: Persistent app shell
  • Contains: AppSidebar.vue (navigation, folder tree, cloud links, quota bar), QuotaBar.vue (storage progress)

Key File Locations

Entry Points:

  • backend/main.py: FastAPI app — start here for any backend investigation
  • backend/celery_app.py: Celery factory — start here for task routing investigation
  • frontend/src/main.js: Vue app mount
  • frontend/src/router/index.js: All routes + auth guard

Configuration:

  • backend/config.py: All env vars with defaults (Pydantic Settings)
  • .env.example: Documented env var template
  • docker-compose.yml: Full service topology with env var wiring
  • frontend/vite.config.js: Dev proxy config (/api:8000)

Core Logic:

  • backend/db/models.py: Full ORM schema — reference for all table structures
  • backend/services/auth.py: JWT, Argon2, TOTP, HIBP — all auth primitives
  • backend/storage/__init__.py: Storage backend factory — entry point for understanding storage routing
  • backend/storage/cloud_utils.py: HKDF credential encryption/decryption

Testing:

  • backend/tests/conftest.py: Test fixtures — DB setup, user creation, auth helpers
  • backend/tests/test_security.py: Security invariant tests (IDOR, admin block, CSRF, timing)

Naming Conventions

Backend files:

  • Modules: snake_case.py
  • One module per resource/concern in api/ (matches the resource noun: documents.py, folders.py)
  • One module per backend in storage/ ({provider}_backend.py)
  • One module per provider in ai/ ({provider}_provider.py)

Frontend files:

  • Vue components: PascalCase.vue
  • Stores: camelCase.js matching the resource noun (documents.js, folders.js)
  • Views: {Name}View.vue pattern

Database:

  • All tables: snake_case plural (users, refresh_tokens, cloud_connections)
  • All PKs: UUID type
  • FKs: {table_singular}_id pattern (user_id, folder_id, document_id)

Where to Add New Code

New API endpoint (new resource):

  • Create backend/api/{resource}.py with APIRouter(prefix="/api/{resource}")
  • Add service logic to backend/services/{resource}.py (or extend existing service)
  • Register router in backend/main.py with app.include_router()
  • Add corresponding export function {action}{Resource}() calls to frontend/src/api/client.js

New Vue page (new route):

  • Create frontend/src/views/{Name}View.vue
  • Add route to frontend/src/router/index.js
  • If it needs auth: add meta: { requiresAuth: true } (or requiresAdmin: true)

New Pinia store:

  • Create frontend/src/stores/{resource}.js using Composition API pattern (defineStore('name', () => { ... }))
  • Export named: export const use{Resource}Store

New storage backend:

  • Implement StorageBackend ABC from backend/storage/base.py
  • Create backend/storage/{provider}_backend.py
  • Add lazy import branch in get_storage_backend_for_document() in backend/storage/__init__.py

New AI provider:

  • Implement AIProvider ABC from backend/ai/base.py
  • Create backend/ai/{provider}_provider.py
  • Register in backend/ai/__init__.py factory

New Celery task:

  • Add task function to appropriate backend/tasks/*.py module
  • Decorate with @celery_app.task(name="tasks.{module}.{task_name}")
  • If periodic: add to celery_app.conf.beat_schedule in backend/celery_app.py

New DB table:

  • Add ORM model class to backend/db/models.py extending Base
  • Create new Alembic migration: alembic revision --autogenerate -m "description"
  • Review and test the generated migration before committing

New tests:

  • Backend: add backend/tests/test_{resource}.py
  • Use fixtures from backend/tests/conftest.py (async session, auth client, test users)
  • Security invariant tests belong in backend/tests/test_security.py

Special Directories

.planning/:

  • Purpose: GSD workflow planning artifacts (roadmap, requirements, phase plans, codebase maps)
  • Generated: Partially (codebase maps regenerated by mapper agents)
  • Committed: Yes

backend/data/:

  • Purpose: Static data files (topic seed data, fixture CSVs)
  • Generated: No
  • Committed: Yes

frontend/dist/:

  • Purpose: Vite production build output
  • Generated: Yes (npm run build)
  • Committed: No (gitignored)

backend/migrations/versions/:

  • Purpose: Alembic migration history — one file per schema change
  • Generated: Via alembic revision then manually reviewed
  • Committed: Yes — each migration is a permanent historical artifact

.claude/worktrees/:

  • Purpose: Isolated git worktrees used by Claude Code agent subprocesses
  • Generated: Yes (by /gsd:execute-phase and related commands)
  • Committed: No

Structure analysis: 2026-06-02