Files
Business-Management/features/doc-service/CLAUDE.md
T
curo1305 cfec3bb906 feat: Phase 4+5 — admin storage UI, backend proxy, CLAUDE.md enforcement
- backend/app/routers/storage_config.py: 5 admin-only endpoints proxying
  storage-service config + migration API (GET/PATCH/POST/DELETE)
- backend/app/main.py: register storage_config router
- frontend/src/api/client.ts: StorageStatus, MigrationStatus,
  StorageBackendConfig interfaces + 5 API functions
- frontend/src/pages/StorageAdminPage.tsx: full admin UI — backend health
  dot, driver selector (local/S3/WebDAV), conditional credential fields,
  Test & Migrate button, live 2s-poll migration progress bar, Cancel
- frontend/src/App.tsx: /admin/storage route (AdminRoute guard)
- CLAUDE.md: storage enforcement rule, updated Docker tables (6 services,
  3 volumes), §20 in merge checklist
- backend/CLAUDE.md, frontend/CLAUDE.md, doc-service/CLAUDE.md,
  ai-service/CLAUDE.md: updated to reflect storage-service integration
- tests/ALL_TESTS.md + tests/storage-service_tests.md: §20 (20 tests)
- backend/STATUS.md, frontend/STATUS.md: updated with new endpoints/routes
- changelog/2026-04-20_storage-service.md: full change log

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-20 16:13:05 +02:00

8.7 KiB
Raw Blame History

doc-service — Claude context

PDF extraction microservice, port 8001 (internal). Shares the same PostgreSQL instance as the backend. Receives proxied requests from backend:8000, which injects x-user-id and x-user-groups headers — doc-service trusts these headers directly. Calls ai-service:8010 for document classification. All file/blob storage goes through storage-service:8020 — no files are written directly to the filesystem. See root CLAUDE.md for architecture, Docker, and project-wide workflows.


Commands

All commands run inside Docker — never on the host.

docker compose exec doc-service alembic revision --autogenerate -m "describe change"
docker compose exec doc-service alembic upgrade head
docker compose exec doc-service alembic downgrade -1

File & Folder Tree

features/doc-service/
├── app/
│   ├── main.py                    ← FastAPI, lifespan (file watcher start/stop)
│   ├── database.py                ← Same PostgreSQL instance as backend
│   ├── deps.py                    ← get_user_id, get_user_groups, get_user_is_admin, get_user_admin_groups (injected headers)
│   ├── models/
│   │   ├── document.py            ← Document model
│   │   ├── category.py            ← DocumentCategory model
│   │   ├── category_assignment.py ← CategoryAssignment (composite PK)
│   │   └── document_share.py      ← DocumentShare model (group-based sharing)
│   ├── schemas/
│   │   ├── document.py            ← DocumentOut, DocumentPage, DocumentStatusOut, etc.
│   │   ├── category.py            ← CategoryOut, CategoryCreate, CategoryUpdate
│   │   └── share.py               ← DocumentShareOut, DocumentShareCreate, SharedDocumentOut
│   ├── routers/
│   │   ├── documents.py           ← Full CRUD + file serving + reprocess + suggestions + sharing
│   │   ├── categories.py          ← Category CRUD (includes watch-owned categories)
│   │   └── plugin.py              ← GET /plugin/manifest, GET+PATCH /plugin/settings
│   └── services/
│       ├── storage.py             ← Storage client: save_upload/download_file/delete_file → storage-service:8020 documents bucket
│       ├── ai_client.py           ← classify_document() → ai-service:8010/chat
│       ├── config_reader.py       ← Config load/save via storage-service config bucket (doc_service_config.json)
│       └── file_watcher.py        ← watchdog-based PDF watcher + startup scan + ingestion
├── alembic/versions/              ← Migration chain
│   ├── 0003_add_watch_columns.py  ← source, watch_path, suggested_folder, suggested_filename
│   ├── 0004_add_document_shares.py ← document_shares table (group-based sharing)
│   └── 0008_rename_file_path_to_storage_key.py ← file_path → storage_key; strips /data/documents/ prefix from existing rows
├── Dockerfile                     ← python:3.12-slim, non-root user 1001
└── STATUS.md

Database Models

documents

Column Type Constraints Notes
id String PK, UUID
user_id String indexed not FK — trusts x-user-id header
filename String NOT NULL
storage_key String NOT NULL storage-service key: {user_id}/{doc_id}.pdf (documents bucket)
file_size Integer NOT NULL bytes
status String default="pending" pending / processing / done / failed
title String(500) nullable AI-extracted
document_type String nullable invoice / bill / receipt / order / expense / revenue / unknown
raw_text Text nullable first 500 k chars
extracted_data Text nullable JSON string
tags Text nullable JSON array string
error_message String(500) nullable
created_at DateTime(tz) server_default=now()
processed_at DateTime(tz) nullable
source String(16) default="upload" "upload" or "watch"
watch_path String nullable original absolute path in watch directory
suggested_folder String(128) nullable AI-suggested category (pending user confirm)
suggested_filename String(500) nullable AI-suggested title/rename (pending user confirm)

document_categories

Column Type Constraints Notes
id String PK, UUID
user_id String indexed owner; "watch" for system categories
name String(128) NOT NULL PascalCase-with-dashes convention enforced on create/rename
scope String(16) NOT NULL, default="personal" "personal" / "group" / "system"
group_id String nullable, indexed set when scope="group"
created_at DateTime(tz) server_default=now()

document_category_assignments (composite PK)

Column Type Constraints
document_id String PK + FK→documents.id CASCADE
category_id String PK + FK→document_categories.id CASCADE

document_shares

Column Type Constraints Notes
id String PK, UUID
document_id String indexed, NOT NULL not FK — trusts proxy
group_id String indexed, NOT NULL group from backend
shared_by_user_id String NOT NULL owner who shared
can_delete Boolean NOT NULL, default=false allows group members to delete the doc
created_at DateTime(tz) server_default=now()

Unique constraint: (document_id, group_id)

Migration chain

Rev ID Slug
0001 create_doc_tables
0002 add_document_title
0003 add_watch_columns
0004 add_document_shares
0005 add_share_can_delete
0006 add_category_scope
0007 capitalize_system_category_names
0008 rename_file_path_to_storage_key

API Endpoints (internal — reached via backend proxy)

All these endpoints are proxied from backend:8000. The backend injects x-user-id and x-user-groups before forwarding.

Documents

Method Path Description
POST /documents/upload Upload PDF (202, background processing)
GET /documents Paginated list (filterable: search, status, type, category, sort)
GET /documents/{id} Document detail
GET /documents/{id}/status Processing status only
PATCH /documents/{id}/type Update document type
PATCH /documents/{id}/tags Update tags
PATCH /documents/{id}/title Update title
POST /documents/{id}/reprocess Re-run AI extraction
DELETE /documents/{id} Delete document (204)
GET /documents/{id}/file Download PDF (streaming)
POST /documents/{id}/categories/{cat_id} Assign category
DELETE /documents/{id}/categories/{cat_id} Remove category
POST /documents/{id}/suggestions/folder/confirm Confirm AI folder suggestion
POST /documents/{id}/suggestions/folder/reject Reject AI folder suggestion
POST /documents/{id}/suggestions/filename/confirm Confirm AI filename suggestion
POST /documents/{id}/suggestions/filename/reject Reject AI filename suggestion
GET /documents/shared-with-me Documents shared with current user via their groups
GET /documents/{id}/shares List groups the document is shared with (owner only)
POST /documents/{id}/shares Share with a group (owner only; group must be in user's groups)
DELETE /documents/{id}/shares/{group_id} Stop sharing with a group (owner only)

Categories

Method Path Description
GET /categories List user's categories
POST /categories Create category (triggers background AI reanalysis)
PATCH /categories/{id} Rename
DELETE /categories/{id} Delete (204)

Plugin

Method Path Description
GET /plugin/manifest Plugin manifest with settings JSON Schema
GET /plugin/settings Current plugin settings
PATCH /plugin/settings Update plugin settings

Default Values & Limits

Parameter Value Location
Document title max 500 chars models/document.py
Category name max 128 chars models/category.py
PDF max size (default) 20 MB admin settings (configurable)
Raw text cap 500 k chars services/ai_client.py
Documents per_page 1100, default 20 routers/documents.py
AI service timeout 60 s services/ai_client.py
AI service max retries 2 services/ai_client.py