# doc-service — Claude context PDF extraction microservice, port 8001 (internal). Shares the same PostgreSQL instance as the backend. Receives proxied requests from `backend:8000`, which injects `x-user-id` and `x-user-groups` headers — doc-service trusts these headers directly. Calls `ai-service:8010` for document classification. See root `CLAUDE.md` for architecture, Docker, and project-wide workflows. --- ## Commands All commands run inside Docker — never on the host. ```bash docker compose exec doc-service alembic revision --autogenerate -m "describe change" docker compose exec doc-service alembic upgrade head docker compose exec doc-service alembic downgrade -1 ``` --- ## File & Folder Tree ``` features/doc-service/ ├── app/ │ ├── main.py ← FastAPI, lifespan (file watcher start/stop) │ ├── database.py ← Same PostgreSQL instance as backend │ ├── deps.py ← get_user_id, get_user_groups, get_user_is_admin, get_user_admin_groups (injected headers) │ ├── models/ │ │ ├── document.py ← Document model │ │ ├── category.py ← DocumentCategory model │ │ ├── category_assignment.py ← CategoryAssignment (composite PK) │ │ └── document_share.py ← DocumentShare model (group-based sharing) │ ├── schemas/ │ │ ├── document.py ← DocumentOut, DocumentPage, DocumentStatusOut, etc. │ │ ├── category.py ← CategoryOut, CategoryCreate, CategoryUpdate │ │ └── share.py ← DocumentShareOut, DocumentShareCreate, SharedDocumentOut │ ├── routers/ │ │ ├── documents.py ← Full CRUD + file serving + reprocess + suggestions + sharing │ │ ├── categories.py ← Category CRUD (includes watch-owned categories) │ │ └── plugin.py ← GET /plugin/manifest, GET+PATCH /plugin/settings │ └── services/ │ ├── storage.py ← File I/O │ ├── ai_client.py ← classify_document() → ai-service:8010/chat │ ├── config_reader.py ← Config load/save including storage/watch settings │ └── file_watcher.py ← watchdog-based PDF watcher + startup scan + ingestion ├── alembic/versions/ ← Migration chain │ ├── 0003_add_watch_columns.py ← source, watch_path, suggested_folder, suggested_filename │ └── 0004_add_document_shares.py ← document_shares table (group-based sharing) ├── Dockerfile ← python:3.12-slim, non-root user 1001 └── STATUS.md ``` --- ## Database Models ### `documents` | Column | Type | Constraints | Notes | |--------|------|-------------|-------| | `id` | String | PK, UUID | | | `user_id` | String | indexed | not FK — trusts x-user-id header | | `filename` | String | NOT NULL | | | `file_path` | String | NOT NULL | absolute path under /data/documents | | `file_size` | Integer | NOT NULL | bytes | | `status` | String | default="pending" | pending / processing / done / failed | | `title` | String(500) | nullable | AI-extracted | | `document_type` | String | nullable | invoice / bill / receipt / order / expense / revenue / unknown | | `raw_text` | Text | nullable | first 500 k chars | | `extracted_data` | Text | nullable | JSON string | | `tags` | Text | nullable | JSON array string | | `error_message` | String(500) | nullable | | | `created_at` | DateTime(tz) | server_default=now() | | | `processed_at` | DateTime(tz) | nullable | | | `source` | String(16) | default="upload" | "upload" or "watch" | | `watch_path` | String | nullable | original absolute path in watch directory | | `suggested_folder` | String(128) | nullable | AI-suggested category (pending user confirm) | | `suggested_filename` | String(500) | nullable | AI-suggested title/rename (pending user confirm) | ### `document_categories` | Column | Type | Constraints | Notes | |--------|------|-------------|-------| | `id` | String | PK, UUID | | | `user_id` | String | indexed | owner; "watch" for system categories | | `name` | String(128) | NOT NULL | PascalCase-with-dashes convention enforced on create/rename | | `scope` | String(16) | NOT NULL, default="personal" | "personal" / "group" / "system" | | `group_id` | String | nullable, indexed | set when scope="group" | | `created_at` | DateTime(tz) | server_default=now() | | ### `document_category_assignments` (composite PK) | Column | Type | Constraints | |--------|------|-------------| | `document_id` | String | PK + FK→documents.id CASCADE | | `category_id` | String | PK + FK→document_categories.id CASCADE | ### `document_shares` | Column | Type | Constraints | Notes | |--------|------|-------------|-------| | `id` | String | PK, UUID | | | `document_id` | String | indexed, NOT NULL | not FK — trusts proxy | | `group_id` | String | indexed, NOT NULL | group from backend | | `shared_by_user_id` | String | NOT NULL | owner who shared | | `can_delete` | Boolean | NOT NULL, default=false | allows group members to delete the doc | | `created_at` | DateTime(tz) | server_default=now() | | Unique constraint: `(document_id, group_id)` ### Migration chain | Rev ID | Slug | |--------|------| | `0001` | `create_doc_tables` | | `0002` | `add_document_title` | | `0003` | `add_watch_columns` | | `0004` | `add_document_shares` | | `0005` | `add_share_can_delete` | | `0006` | `add_category_scope` | | `0007` | `capitalize_system_category_names` | --- ## API Endpoints (internal — reached via backend proxy) All these endpoints are proxied from `backend:8000`. The backend injects `x-user-id` and `x-user-groups` before forwarding. ### Documents | Method | Path | Description | |--------|------|-------------| | POST | `/documents/upload` | Upload PDF (202, background processing) | | GET | `/documents` | Paginated list (filterable: search, status, type, category, sort) | | GET | `/documents/{id}` | Document detail | | GET | `/documents/{id}/status` | Processing status only | | PATCH | `/documents/{id}/type` | Update document type | | PATCH | `/documents/{id}/tags` | Update tags | | PATCH | `/documents/{id}/title` | Update title | | POST | `/documents/{id}/reprocess` | Re-run AI extraction | | DELETE | `/documents/{id}` | Delete document (204) | | GET | `/documents/{id}/file` | Download PDF (streaming) | | POST | `/documents/{id}/categories/{cat_id}` | Assign category | | DELETE | `/documents/{id}/categories/{cat_id}` | Remove category | | POST | `/documents/{id}/suggestions/folder/confirm` | Confirm AI folder suggestion | | POST | `/documents/{id}/suggestions/folder/reject` | Reject AI folder suggestion | | POST | `/documents/{id}/suggestions/filename/confirm` | Confirm AI filename suggestion | | POST | `/documents/{id}/suggestions/filename/reject` | Reject AI filename suggestion | | GET | `/documents/shared-with-me` | Documents shared with current user via their groups | | GET | `/documents/{id}/shares` | List groups the document is shared with (owner only) | | POST | `/documents/{id}/shares` | Share with a group (owner only; group must be in user's groups) | | DELETE | `/documents/{id}/shares/{group_id}` | Stop sharing with a group (owner only) | ### Categories | Method | Path | Description | |--------|------|-------------| | GET | `/categories` | List user's categories | | POST | `/categories` | Create category (triggers background AI reanalysis) | | PATCH | `/categories/{id}` | Rename | | DELETE | `/categories/{id}` | Delete (204) | ### Plugin | Method | Path | Description | |--------|------|-------------| | GET | `/plugin/manifest` | Plugin manifest with settings JSON Schema | | GET | `/plugin/settings` | Current plugin settings | | PATCH | `/plugin/settings` | Update plugin settings | --- ## Default Values & Limits | Parameter | Value | Location | |-----------|-------|----------| | Document title max | 500 chars | `models/document.py` | | Category name max | 128 chars | `models/category.py` | | PDF max size (default) | 20 MB | admin settings (configurable) | | Raw text cap | 500 k chars | `services/ai_client.py` | | Documents per_page | 1–100, default 20 | `routers/documents.py` | | AI service timeout | 60 s | `services/ai_client.py` | | AI service max retries | 2 | `services/ai_client.py` |