Files
Business-Management/features/doc-service/CLAUDE.md
T
curo1305 c5976882be Split monolithic CLAUDE.md into per-service sub-files
Root CLAUDE.md now contains only project-wide concerns (stack, architecture,
Docker, workflows, security hook). Service-specific details moved to:
- backend/CLAUDE.md — DB models, API endpoints, JWT/bcrypt, naming conventions
- frontend/CLAUDE.md — routes, TanStack Query patterns, XSS prevention
- features/ai-service/CLAUDE.md — queue endpoints, provider notes
- features/doc-service/CLAUDE.md — document models, PDF limits, proxy endpoints

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-18 13:10:10 +02:00

177 lines
7.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# doc-service — Claude context
PDF extraction microservice, port 8001 (internal). Shares the same PostgreSQL instance as the backend. Receives proxied requests from `backend:8000`, which injects `x-user-id` and `x-user-groups` headers — doc-service trusts these headers directly. Calls `ai-service:8010` for document classification. See root `CLAUDE.md` for architecture, Docker, and project-wide workflows.
---
## Commands
All commands run inside Docker — never on the host.
```bash
docker compose exec doc-service alembic revision --autogenerate -m "describe change"
docker compose exec doc-service alembic upgrade head
docker compose exec doc-service alembic downgrade -1
```
---
## File & Folder Tree
```
features/doc-service/
├── app/
│ ├── main.py ← FastAPI, lifespan (file watcher start/stop)
│ ├── database.py ← Same PostgreSQL instance as backend
│ ├── deps.py ← get_user_id (x-user-id), get_user_groups (x-user-groups)
│ ├── models/
│ │ ├── document.py ← Document model
│ │ ├── category.py ← DocumentCategory model
│ │ ├── category_assignment.py ← CategoryAssignment (composite PK)
│ │ └── document_share.py ← DocumentShare model (group-based sharing)
│ ├── schemas/
│ │ ├── document.py ← DocumentOut, DocumentPage, DocumentStatusOut, etc.
│ │ ├── category.py ← CategoryOut, CategoryCreate, CategoryUpdate
│ │ └── share.py ← DocumentShareOut, DocumentShareCreate, SharedDocumentOut
│ ├── routers/
│ │ ├── documents.py ← Full CRUD + file serving + reprocess + suggestions + sharing
│ │ ├── categories.py ← Category CRUD (includes watch-owned categories)
│ │ └── plugin.py ← GET /plugin/manifest, GET+PATCH /plugin/settings
│ └── services/
│ ├── storage.py ← File I/O
│ ├── ai_client.py ← classify_document() → ai-service:8010/chat
│ ├── config_reader.py ← Config load/save including storage/watch settings
│ └── file_watcher.py ← watchdog-based PDF watcher + startup scan + ingestion
├── alembic/versions/ ← Migration chain
│ ├── 0003_add_watch_columns.py ← source, watch_path, suggested_folder, suggested_filename
│ └── 0004_add_document_shares.py ← document_shares table (group-based sharing)
├── Dockerfile ← python:3.12-slim, non-root user 1001
└── STATUS.md
```
---
## Database Models
### `documents`
| Column | Type | Constraints | Notes |
|--------|------|-------------|-------|
| `id` | String | PK, UUID | |
| `user_id` | String | indexed | not FK — trusts x-user-id header |
| `filename` | String | NOT NULL | |
| `file_path` | String | NOT NULL | absolute path under /data/documents |
| `file_size` | Integer | NOT NULL | bytes |
| `status` | String | default="pending" | pending / processing / done / failed |
| `title` | String(500) | nullable | AI-extracted |
| `document_type` | String | nullable | invoice / bill / receipt / order / expense / revenue / unknown |
| `raw_text` | Text | nullable | first 500 k chars |
| `extracted_data` | Text | nullable | JSON string |
| `tags` | Text | nullable | JSON array string |
| `error_message` | String(500) | nullable | |
| `created_at` | DateTime(tz) | server_default=now() | |
| `processed_at` | DateTime(tz) | nullable | |
| `source` | String(16) | default="upload" | "upload" or "watch" |
| `watch_path` | String | nullable | original absolute path in watch directory |
| `suggested_folder` | String(128) | nullable | AI-suggested category (pending user confirm) |
| `suggested_filename` | String(500) | nullable | AI-suggested title/rename (pending user confirm) |
### `document_categories`
| Column | Type | Constraints |
|--------|------|-------------|
| `id` | String | PK, UUID |
| `user_id` | String | indexed |
| `name` | String(128) | NOT NULL |
| `created_at` | DateTime(tz) | server_default=now() |
### `document_category_assignments` (composite PK)
| Column | Type | Constraints |
|--------|------|-------------|
| `document_id` | String | PK + FK→documents.id CASCADE |
| `category_id` | String | PK + FK→document_categories.id CASCADE |
### `document_shares`
| Column | Type | Constraints | Notes |
|--------|------|-------------|-------|
| `id` | String | PK, UUID | |
| `document_id` | String | indexed, NOT NULL | not FK — trusts proxy |
| `group_id` | String | indexed, NOT NULL | group from backend |
| `shared_by_user_id` | String | NOT NULL | owner who shared |
| `created_at` | DateTime(tz) | server_default=now() | |
Unique constraint: `(document_id, group_id)`
### Migration chain
| Rev ID | Slug |
|--------|------|
| `0001` | `create_doc_tables` |
| `0002` | `add_document_title` |
| `0003` | `add_watch_columns` |
| `0004` | `add_document_shares` |
---
## API Endpoints (internal — reached via backend proxy)
All these endpoints are proxied from `backend:8000`. The backend injects `x-user-id` and `x-user-groups` before forwarding.
### Documents
| Method | Path | Description |
|--------|------|-------------|
| POST | `/documents/upload` | Upload PDF (202, background processing) |
| GET | `/documents` | Paginated list (filterable: search, status, type, category, sort) |
| GET | `/documents/{id}` | Document detail |
| GET | `/documents/{id}/status` | Processing status only |
| PATCH | `/documents/{id}/type` | Update document type |
| PATCH | `/documents/{id}/tags` | Update tags |
| PATCH | `/documents/{id}/title` | Update title |
| POST | `/documents/{id}/reprocess` | Re-run AI extraction |
| DELETE | `/documents/{id}` | Delete document (204) |
| GET | `/documents/{id}/file` | Download PDF (streaming) |
| POST | `/documents/{id}/categories/{cat_id}` | Assign category |
| DELETE | `/documents/{id}/categories/{cat_id}` | Remove category |
| POST | `/documents/{id}/suggestions/folder/confirm` | Confirm AI folder suggestion |
| POST | `/documents/{id}/suggestions/folder/reject` | Reject AI folder suggestion |
| POST | `/documents/{id}/suggestions/filename/confirm` | Confirm AI filename suggestion |
| POST | `/documents/{id}/suggestions/filename/reject` | Reject AI filename suggestion |
| GET | `/documents/shared-with-me` | Documents shared with current user via their groups |
| GET | `/documents/{id}/shares` | List groups the document is shared with (owner only) |
| POST | `/documents/{id}/shares` | Share with a group (owner only; group must be in user's groups) |
| DELETE | `/documents/{id}/shares/{group_id}` | Stop sharing with a group (owner only) |
### Categories
| Method | Path | Description |
|--------|------|-------------|
| GET | `/categories` | List user's categories |
| POST | `/categories` | Create category (triggers background AI reanalysis) |
| PATCH | `/categories/{id}` | Rename |
| DELETE | `/categories/{id}` | Delete (204) |
### Plugin
| Method | Path | Description |
|--------|------|-------------|
| GET | `/plugin/manifest` | Plugin manifest with settings JSON Schema |
| GET | `/plugin/settings` | Current plugin settings |
| PATCH | `/plugin/settings` | Update plugin settings |
---
## Default Values & Limits
| Parameter | Value | Location |
|-----------|-------|----------|
| Document title max | 500 chars | `models/document.py` |
| Category name max | 128 chars | `models/category.py` |
| PDF max size (default) | 20 MB | admin settings (configurable) |
| Raw text cap | 500 k chars | `services/ai_client.py` |
| Documents per_page | 1100, default 20 | `routers/documents.py` |
| AI service timeout | 60 s | `services/ai_client.py` |
| AI service max retries | 2 | `services/ai_client.py` |