Files
Business-Management/TODO.md
T
curo1305 0d34867a69 Add PDF document service with AI extraction and per-app settings
- New `features/doc-service` FastAPI microservice: PDF upload, async
  text extraction (pdfplumber), AI classification via Anthropic/Ollama/
  LM Studio, per-user categories, file download
- Alembic migration isolated with `alembic_version_doc_service` table
- Main backend: httpx proxy routers for /api/documents/* and
  /api/documents/categories/*, admin settings API at /api/settings/*
- Runtime config in /config/doc_service_config.json (shared Docker
  volume); api_key masking on reads; atomic write with os.replace()
- Frontend: DocumentsPage, DocumentAdminSettingsPage, updated AppsPage
  launcher hub, simplified Nav (removed Settings link), new routes
- docker-compose: doc-service service, doc_data + app_config volumes,
  removed internal:true from backend-net for outbound AI API calls
- Fix pre-commit hook: probe Docker socket path so git subprocess picks
  up Docker Desktop on macOS
- Fix security_check.py: use sys.executable for bandit so venv python
  is used instead of system python

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-14 05:28:11 +02:00

59 lines
4.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# TODO
## UX/UI — Penpot setup
- [ ] **Spin up Penpot LXC** — separate LXC container on the server (~24 GB RAM), Docker Compose from https://github.com/penpot/penpot; expose via subdomain behind nginx proxy manager
- [ ] **Create Penpot project** — register on the self-hosted instance, create project `destroying_sap`, create initial design file
- [ ] **Generate Penpot access token** — Profile → Access tokens; used by the `ux-designer` agent via WebFetch REST API calls
- [ ] **Decide on UI component library** — shadcn/ui (recommended: Tailwind-based, unstyled accessible primitives, white-label friendly) vs MUI vs other; decision affects both Penpot design system and frontend implementation
- [ ] **Connect ux-designer agent** — confirm Penpot API reachable, provide instance URL + token to agent at session start
## Auth / session security
- [x] **8-hour JWT expiry**`ACCESS_TOKEN_EXPIRE_MINUTES = 60 * 8`; no permanent login
- [x] **RS256 JWT signing** — 4096-bit RSA asymmetric keys; `iat` claim included; generate keys with `scripts/generate_jwt_keys.py`
- [ ] **No refresh tokens** — refresh token flow not implemented; if added later, must use `httpOnly` cookies and rotation
- [ ] **`httpOnly` cookie migration** — currently storing JWT in `localStorage` (XSS-exposed); migrate to `httpOnly` cookie when hardening for production
## App permissions
- [ ] **Permissions registry** — admin-managed table that controls which apps each user can access. Schema: `user_app_permissions (user_id FK, app_key)`. Admin UI lets the admin grant/revoke per-app access per user. The Apps page only shows apps the current user has been granted access to.
## PDF Documents app (`features/doc-service`)
- [x] **doc-service container** — FastAPI microservice on `backend-net`; never exposed to host or frontend directly
- [x] **PDF upload + async extraction** — background task with pdfplumber + pluggable AI (Anthropic / Ollama / LM Studio)
- [x] **Per-app settings page**`/apps/documents/settings/admin`; AI provider config, max file size; admin only
- [x] **Per-user categories** — create/rename/delete categories; assign multiple categories per document
- [x] **Alembic isolation**`alembic_version_doc_service` version table; no collision with main backend migrations
- [x] **Runtime config file**`/config/doc_service_config.json` on shared Docker volume; editable from frontend; 30s TTL cache in doc-service
- [ ] **Re-process document** — UI button to re-trigger AI extraction on an existing document (after changing AI provider/model)
- [ ] **Bulk category operations** — assign/remove a category from multiple documents at once
- [ ] **Search / filter documents** — filter by status, document type, category, date range
## Frontend features
- [x] **Logout button** — visible when logged in, clears token and redirects to `/login`
- [x] **Profile page** (`/profile`) — shows personal information for the logged-in user
- [x] **Edit & save profile** — form to update personal details, stored in a dedicated `profiles` table (separate from `users`, same PostgreSQL container)
## App container architecture (future)
Design decision: each installable app (billing, PDF, email, etc.) runs in its own isolated Docker/Podman container, spawned and managed by the backend via the Docker API. Key rules to implement:
- [ ] **Docker socket proxy** — backend must never mount `/var/run/docker.sock` directly; use `tecnativa/docker-socket-proxy` on an internal-only network, with only the required API endpoints whitelisted (CONTAINERS, IMAGES, NETWORKS, POST). Raw socket access = root on the host.
- [ ] **Network isolation per app** — each spawned app container gets its own Docker bridge network; app containers never talk to each other directly; only the backend can reach them
- [ ] **No privileged app containers** — all spawned containers run without `--privileged`, without extra capabilities, with resource limits (CPU, memory)
- [ ] **Image allowlist** — backend may only spawn containers from a pre-approved image list; never pull or build arbitrary images at runtime
- [ ] **Consider Podman** — evaluate rootless Podman as replacement for Docker daemon; daemonless model eliminates the socket entirely; Docker SDK compatible
## Infrastructure
- [x] **Docker port hardening** — only port 80 (prod) / 5173 (dev) exposed on the host via `frontend-net`; backend and db have no host port bindings and sit on `internal: true` `backend-net`
## Infrastructure (existing)
- [x] **Rootless containers** — run backend and frontend containers as non-root users (add `USER` directive to Dockerfiles, map UID/GID appropriately)
- [ ] **Persistent storage** — ensure database data, config files, and any uploaded assets survive container restarts and rebuilds (named volumes, bind mounts for config)
- [ ] **Docker development workflow** — document and streamline the full dev loop: hot reload, one-command startup, migration handling, seed data, and how to attach a debugger