cfec3bb906
- backend/app/routers/storage_config.py: 5 admin-only endpoints proxying storage-service config + migration API (GET/PATCH/POST/DELETE) - backend/app/main.py: register storage_config router - frontend/src/api/client.ts: StorageStatus, MigrationStatus, StorageBackendConfig interfaces + 5 API functions - frontend/src/pages/StorageAdminPage.tsx: full admin UI — backend health dot, driver selector (local/S3/WebDAV), conditional credential fields, Test & Migrate button, live 2s-poll migration progress bar, Cancel - frontend/src/App.tsx: /admin/storage route (AdminRoute guard) - CLAUDE.md: storage enforcement rule, updated Docker tables (6 services, 3 volumes), §20 in merge checklist - backend/CLAUDE.md, frontend/CLAUDE.md, doc-service/CLAUDE.md, ai-service/CLAUDE.md: updated to reflect storage-service integration - tests/ALL_TESTS.md + tests/storage-service_tests.md: §20 (20 tests) - backend/STATUS.md, frontend/STATUS.md: updated with new endpoints/routes - changelog/2026-04-20_storage-service.md: full change log Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
185 lines
8.7 KiB
Markdown
185 lines
8.7 KiB
Markdown
# doc-service — Claude context
|
||
|
||
PDF extraction microservice, port 8001 (internal). Shares the same PostgreSQL instance as the backend. Receives proxied requests from `backend:8000`, which injects `x-user-id` and `x-user-groups` headers — doc-service trusts these headers directly. Calls `ai-service:8010` for document classification. All file/blob storage goes through `storage-service:8020` — no files are written directly to the filesystem. See root `CLAUDE.md` for architecture, Docker, and project-wide workflows.
|
||
|
||
---
|
||
|
||
## Commands
|
||
|
||
All commands run inside Docker — never on the host.
|
||
|
||
```bash
|
||
docker compose exec doc-service alembic revision --autogenerate -m "describe change"
|
||
docker compose exec doc-service alembic upgrade head
|
||
docker compose exec doc-service alembic downgrade -1
|
||
```
|
||
|
||
---
|
||
|
||
## File & Folder Tree
|
||
|
||
```
|
||
features/doc-service/
|
||
├── app/
|
||
│ ├── main.py ← FastAPI, lifespan (file watcher start/stop)
|
||
│ ├── database.py ← Same PostgreSQL instance as backend
|
||
│ ├── deps.py ← get_user_id, get_user_groups, get_user_is_admin, get_user_admin_groups (injected headers)
|
||
│ ├── models/
|
||
│ │ ├── document.py ← Document model
|
||
│ │ ├── category.py ← DocumentCategory model
|
||
│ │ ├── category_assignment.py ← CategoryAssignment (composite PK)
|
||
│ │ └── document_share.py ← DocumentShare model (group-based sharing)
|
||
│ ├── schemas/
|
||
│ │ ├── document.py ← DocumentOut, DocumentPage, DocumentStatusOut, etc.
|
||
│ │ ├── category.py ← CategoryOut, CategoryCreate, CategoryUpdate
|
||
│ │ └── share.py ← DocumentShareOut, DocumentShareCreate, SharedDocumentOut
|
||
│ ├── routers/
|
||
│ │ ├── documents.py ← Full CRUD + file serving + reprocess + suggestions + sharing
|
||
│ │ ├── categories.py ← Category CRUD (includes watch-owned categories)
|
||
│ │ └── plugin.py ← GET /plugin/manifest, GET+PATCH /plugin/settings
|
||
│ └── services/
|
||
│ ├── storage.py ← Storage client: save_upload/download_file/delete_file → storage-service:8020 documents bucket
|
||
│ ├── ai_client.py ← classify_document() → ai-service:8010/chat
|
||
│ ├── config_reader.py ← Config load/save via storage-service config bucket (doc_service_config.json)
|
||
│ └── file_watcher.py ← watchdog-based PDF watcher + startup scan + ingestion
|
||
├── alembic/versions/ ← Migration chain
|
||
│ ├── 0003_add_watch_columns.py ← source, watch_path, suggested_folder, suggested_filename
|
||
│ ├── 0004_add_document_shares.py ← document_shares table (group-based sharing)
|
||
│ └── 0008_rename_file_path_to_storage_key.py ← file_path → storage_key; strips /data/documents/ prefix from existing rows
|
||
├── Dockerfile ← python:3.12-slim, non-root user 1001
|
||
└── STATUS.md
|
||
```
|
||
|
||
---
|
||
|
||
## Database Models
|
||
|
||
### `documents`
|
||
|
||
| Column | Type | Constraints | Notes |
|
||
|--------|------|-------------|-------|
|
||
| `id` | String | PK, UUID | |
|
||
| `user_id` | String | indexed | not FK — trusts x-user-id header |
|
||
| `filename` | String | NOT NULL | |
|
||
| `storage_key` | String | NOT NULL | storage-service key: `{user_id}/{doc_id}.pdf` (documents bucket) |
|
||
| `file_size` | Integer | NOT NULL | bytes |
|
||
| `status` | String | default="pending" | pending / processing / done / failed |
|
||
| `title` | String(500) | nullable | AI-extracted |
|
||
| `document_type` | String | nullable | invoice / bill / receipt / order / expense / revenue / unknown |
|
||
| `raw_text` | Text | nullable | first 500 k chars |
|
||
| `extracted_data` | Text | nullable | JSON string |
|
||
| `tags` | Text | nullable | JSON array string |
|
||
| `error_message` | String(500) | nullable | |
|
||
| `created_at` | DateTime(tz) | server_default=now() | |
|
||
| `processed_at` | DateTime(tz) | nullable | |
|
||
| `source` | String(16) | default="upload" | "upload" or "watch" |
|
||
| `watch_path` | String | nullable | original absolute path in watch directory |
|
||
| `suggested_folder` | String(128) | nullable | AI-suggested category (pending user confirm) |
|
||
| `suggested_filename` | String(500) | nullable | AI-suggested title/rename (pending user confirm) |
|
||
|
||
### `document_categories`
|
||
|
||
| Column | Type | Constraints | Notes |
|
||
|--------|------|-------------|-------|
|
||
| `id` | String | PK, UUID | |
|
||
| `user_id` | String | indexed | owner; "watch" for system categories |
|
||
| `name` | String(128) | NOT NULL | PascalCase-with-dashes convention enforced on create/rename |
|
||
| `scope` | String(16) | NOT NULL, default="personal" | "personal" / "group" / "system" |
|
||
| `group_id` | String | nullable, indexed | set when scope="group" |
|
||
| `created_at` | DateTime(tz) | server_default=now() | |
|
||
|
||
### `document_category_assignments` (composite PK)
|
||
|
||
| Column | Type | Constraints |
|
||
|--------|------|-------------|
|
||
| `document_id` | String | PK + FK→documents.id CASCADE |
|
||
| `category_id` | String | PK + FK→document_categories.id CASCADE |
|
||
|
||
### `document_shares`
|
||
|
||
| Column | Type | Constraints | Notes |
|
||
|--------|------|-------------|-------|
|
||
| `id` | String | PK, UUID | |
|
||
| `document_id` | String | indexed, NOT NULL | not FK — trusts proxy |
|
||
| `group_id` | String | indexed, NOT NULL | group from backend |
|
||
| `shared_by_user_id` | String | NOT NULL | owner who shared |
|
||
| `can_delete` | Boolean | NOT NULL, default=false | allows group members to delete the doc |
|
||
| `created_at` | DateTime(tz) | server_default=now() | |
|
||
|
||
Unique constraint: `(document_id, group_id)`
|
||
|
||
### Migration chain
|
||
|
||
| Rev ID | Slug |
|
||
|--------|------|
|
||
| `0001` | `create_doc_tables` |
|
||
| `0002` | `add_document_title` |
|
||
| `0003` | `add_watch_columns` |
|
||
| `0004` | `add_document_shares` |
|
||
| `0005` | `add_share_can_delete` |
|
||
| `0006` | `add_category_scope` |
|
||
| `0007` | `capitalize_system_category_names` |
|
||
| `0008` | `rename_file_path_to_storage_key` |
|
||
|
||
---
|
||
|
||
## API Endpoints (internal — reached via backend proxy)
|
||
|
||
All these endpoints are proxied from `backend:8000`. The backend injects `x-user-id` and `x-user-groups` before forwarding.
|
||
|
||
### Documents
|
||
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| POST | `/documents/upload` | Upload PDF (202, background processing) |
|
||
| GET | `/documents` | Paginated list (filterable: search, status, type, category, sort) |
|
||
| GET | `/documents/{id}` | Document detail |
|
||
| GET | `/documents/{id}/status` | Processing status only |
|
||
| PATCH | `/documents/{id}/type` | Update document type |
|
||
| PATCH | `/documents/{id}/tags` | Update tags |
|
||
| PATCH | `/documents/{id}/title` | Update title |
|
||
| POST | `/documents/{id}/reprocess` | Re-run AI extraction |
|
||
| DELETE | `/documents/{id}` | Delete document (204) |
|
||
| GET | `/documents/{id}/file` | Download PDF (streaming) |
|
||
| POST | `/documents/{id}/categories/{cat_id}` | Assign category |
|
||
| DELETE | `/documents/{id}/categories/{cat_id}` | Remove category |
|
||
| POST | `/documents/{id}/suggestions/folder/confirm` | Confirm AI folder suggestion |
|
||
| POST | `/documents/{id}/suggestions/folder/reject` | Reject AI folder suggestion |
|
||
| POST | `/documents/{id}/suggestions/filename/confirm` | Confirm AI filename suggestion |
|
||
| POST | `/documents/{id}/suggestions/filename/reject` | Reject AI filename suggestion |
|
||
| GET | `/documents/shared-with-me` | Documents shared with current user via their groups |
|
||
| GET | `/documents/{id}/shares` | List groups the document is shared with (owner only) |
|
||
| POST | `/documents/{id}/shares` | Share with a group (owner only; group must be in user's groups) |
|
||
| DELETE | `/documents/{id}/shares/{group_id}` | Stop sharing with a group (owner only) |
|
||
|
||
### Categories
|
||
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| GET | `/categories` | List user's categories |
|
||
| POST | `/categories` | Create category (triggers background AI reanalysis) |
|
||
| PATCH | `/categories/{id}` | Rename |
|
||
| DELETE | `/categories/{id}` | Delete (204) |
|
||
|
||
### Plugin
|
||
|
||
| Method | Path | Description |
|
||
|--------|------|-------------|
|
||
| GET | `/plugin/manifest` | Plugin manifest with settings JSON Schema |
|
||
| GET | `/plugin/settings` | Current plugin settings |
|
||
| PATCH | `/plugin/settings` | Update plugin settings |
|
||
|
||
---
|
||
|
||
## Default Values & Limits
|
||
|
||
| Parameter | Value | Location |
|
||
|-----------|-------|----------|
|
||
| Document title max | 500 chars | `models/document.py` |
|
||
| Category name max | 128 chars | `models/category.py` |
|
||
| PDF max size (default) | 20 MB | admin settings (configurable) |
|
||
| Raw text cap | 500 k chars | `services/ai_client.py` |
|
||
| Documents per_page | 1–100, default 20 | `routers/documents.py` |
|
||
| AI service timeout | 60 s | `services/ai_client.py` |
|
||
| AI service max retries | 2 | `services/ai_client.py` |
|