Commit Graph

15 Commits

Author SHA1 Message Date
curo1305 95c7ed786a feat(06.2-03): backend — cloud-aware delete routing + skip_quota + remove_only param
- storage.delete_document gains skip_quota=False param; quota decrement gated on it
- DELETE /api/documents/{id} gains remove_only=bool query param
- Cloud docs (storage_backend != minio): attempt cloud backend delete_object first
  - On failure: return HTTP 200 {success: false, cloud_delete_failed: true} (not 4xx)
  - On success or remove_only: delete DB row with skip_quota=True
- Cloud creds/exception message never included in response body (T-06.2-03-02)
- Promote 3 xfail stubs to real tests (propagates, failure, remove_only)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 15:09:44 +02:00
curo1305 bf7d86184d fix(documents): normalize UUID to undashed hex in raw SQL quota UPDATE
str(uuid) returns dashed format (xxxx-xxxx-…) which mismatches SQLite's
CHAR(32) storage (undashed hex). Replace with .replace('-', '') so the
WHERE clause matches in both SQLite (tests) and PostgreSQL (production).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 18:57:02 +02:00
curo1305 b1a136b5be fix(05-12): resolve 3 critical code review findings
CR-01: add `except HTTPException: raise` before broad except in
stream_document_content — prevents 503 (reconnect prompt) from being
swallowed and replaced with misleading 502

CR-02: move pre-flight credential checks BEFORE Redis setex in
oauth_initiate — no orphan state tokens written for unconfigured providers;
also adds onedrive_tenant_id to OneDrive pre-flight condition (WR-02)

CR-03: add CLOUD_CREDS_KEY to celery-worker environment in docker-compose.yml
— worker cannot decrypt cloud credentials without this key; every cloud
document task was silently failing at runtime

WR-03: assert Redis store empty after 400 pre-flight responses in both
new tests — confirms no token leak on misconfigured-provider requests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 18:04:09 +02:00
curo1305 10175ee4b5 fix(05-12): close 3 UAT gaps — OAuth 400 preflight, 502 cloud fallback, upload hint
- oauth_initiate: pre-flight check returns 400 with env-var hint when
  GOOGLE_CLIENT_ID/SECRET or ONEDRIVE_CLIENT_ID/SECRET are not configured,
  preventing opaque MSAL/OAuth library 500 errors on misconfigured servers
- stream_document_content: broad except-clause catches non-CloudConnectionError
  exceptions and returns 502 with user-friendly message (was raw 500)
- docker-compose.yml: add volumes: - ./backend:/app to celery-worker so code
  changes are picked up by docker compose restart without a rebuild
- CloudStorageView: upload hint paragraph directs users to navigate into a
  cloud folder; no DropZone added (no folder context at overview level)
- 3 new backend tests pass; 2 existing tests patched with credential monkeypatch;
  full suite: 293 passed, 0 new failures, 1 pre-existing (test_extract_docx)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 17:55:08 +02:00
curo1305 34f012b4e8 fix(05): resolve 5 critical code review findings
CR-01: add Field(min_length=1) to UserDeleteConfirm.admin_password
CR-02: add folder ownership check in PATCH /documents/{id} — prevents IDOR
        when folder_id belongs to another user
CR-03: add min_length=1, max_length=255, and path-separator validator to
        DocumentPatch.filename — prevents empty and path-traversal filenames
CR-04: fetchDocumentContent now throws on non-ok responses instead of
        silently returning the error Response
CR-05: object URL revoke in DocumentView uses pagehide + load events with
        120s fallback instead of unreliable 60s blind timer

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 11:51:54 +02:00
curo1305 6d094d17f0 feat(05-09): PATCH /documents/{id} endpoint + cloud-aware Celery re-analyze
- Add DocumentPatch Pydantic model with filename and folder_id optional fields
- Add PATCH /api/documents/{doc_id} endpoint: ownership guard, model_fields_set
  to distinguish absent vs null folder_id, returns updated metadata dict
- Update _run() in document_tasks.py to use get_storage_backend_for_document
  for non-MinIO backends instead of hardcoded MinIO path
- CloudConnectionError caught in cloud path: returns extract_failed status
- Update test to use pure unit mocks (no PostgreSQL) for _run() cloud routing
- All 3 plan tests pass; 23 test_cloud.py tests pass
2026-05-30 11:16:01 +02:00
curo1305 d7d6382d49 feat(05-06): extend upload and content-proxy endpoints for cloud backends
- Add POST /api/documents/upload multipart endpoint with target_backend form field
- Cloud backends (google_drive, onedrive, nextcloud, webdav) use direct put_object()
- MinIO path generates presigned PUT URL (unchanged flow)
- Cloud uploads do NOT touch quota (D-11: separate backend)
- GET /api/documents/{id}/content now uses get_storage_backend_for_document
- CloudConnectionError from any cloud op raises HTTPException(503) with safe message
- target_backend validated against _CLOUD_PROVIDERS allowlist (T-05-06-01)
- Import CloudConnectionError with fallback stub for envs without google-auth deps
2026-05-29 07:45:28 +02:00
curo1305 e451b16f8f feat(phase-4): Task 1 — audit log backfill in auth.py and documents.py (D-13)
- Add write_audit_log import to auth.py and documents.py
- auth.py: login success (auth.login), login failure (auth.login_failed, no PII),
  logout (auth.logout), logout-all (auth.sign_out_all), change-password
  (auth.password_changed), TOTP enable (auth.totp_enrolled), TOTP disable
  (auth.totp_revoked), backup code used (auth.backup_code_used)
- documents.py: upload confirm (document.uploaded, size+backend only),
  document delete (document.deleted, size only — no filename/extracted_text)
- Add request: Request param to change_password, disable_totp, confirm_upload, delete_document
2026-05-25 21:48:15 +02:00
curo1305 f868a4e0c7 feat(phase-4-05): document streaming proxy GET /api/documents/{id}/content (DOC-02)
- Add _parse_range() helper: validates Range header bounds, raises 416 on invalid
- Add stream_document_content endpoint with get_regular_user dep (admin → 403)
- Access check: owner OR Share.recipient_id; neither → 404
- Bytes fetched via get_object() only — presigned_get_url() never called
- Range requests return 206 + Content-Range header
- Add pdf_open_mode column to User ORM model (migration 0004 already applied)
- Use HTTP_416_RANGE_NOT_SATISFIABLE (non-deprecated constant)
2026-05-25 18:48:32 +02:00
curo1305 33a6f9a290 feat(phase-4): Folders API (FOLD-01..05), audit helper (flush-not-commit), document sort/FTS/move
- backend/api/folders.py: POST /api/folders (create), GET /api/folders (list),
  GET /api/folders/{id} (breadcrumb), PATCH /api/folders/{id} (rename),
  DELETE /api/folders/{id} (cascade-delete + atomic quota decrement),
  PATCH /api/documents/{id}/folder (move document)
- All folder endpoints use get_regular_user (admin gets 403); 404 for IDOR
- IntegrityError caught -> 409 on duplicate folder name under same parent
- WITH RECURSIVE CTE for subtree collection with SQLite fallback (OperationalError)
- Atomic quota decrement with CASE WHEN pattern (SQLite compat)
- MinIO object deletion best-effort (per-object try/except)
- write_audit_log called after folder.created, folder.renamed, folder.deleted
- backend/api/documents.py: add sort, order, folder_id, q params to list_documents;
  add is_shared field to each document in response using Share subquery
- backend/main.py: register folders_router and document_move_router
2026-05-25 18:37:22 +02:00
curo1305 b28bb01995 feat(03-03): add get_regular_user dep; wire auth + ownership into /api/documents/*
- Add get_regular_user FastAPI dep (rejects admin with 403) to deps/auth.py
- Wire Depends(get_regular_user) into all 6 /api/documents/* handlers
- upload-url: replace null-user/... object_key with str(current_user.id)/...; set user_id=current_user.id
- confirm: remove Wave 2 doc.user_id is None guard — quota runs unconditionally; add ownership assertion (404 on cross-user)
- list: filter by user_id=current_user.id via storage.list_metadata(user_id=...)
- get/delete/classify: ownership assertion (doc.user_id != current_user.id → 404)
- storage.list_metadata: add required user_id param + Document.user_id == user_id filter
- storage.delete_document: remove if doc.user_id is not None guard; use CASE WHEN for SQLite-compat quota decrement
- Tests: update existing tests to pass auth headers; implement test_cross_user_access_404, test_admin_cannot_access_documents, test_documents_require_auth; mark test_confirm_endpoint xfail(strict=False) for SQLite UUID mismatch
2026-05-23 20:05:34 +02:00
curo1305 0d51d023ce feat(03-02): implement presigned upload flow, quota enforcement, cleanup task
- Replace POST /api/documents/upload with POST /api/documents/upload-url + /{id}/confirm
- upload-url: create pending Document row with user_id=None (Wave 2), return presigned PUT URL
- confirm: stat MinIO for authoritative size (T-03-05), atomic quota UPDATE (T-03-06, STORE-03)
- Confirm returns 413 with {used_bytes, limit_bytes, rejected_bytes} on quota exceeded (STORE-05)
- Wave 2 guard: skip quota UPDATE when doc.user_id is None (Plan 03-03 removes this)
- Add GET /api/auth/me/quota to api/auth.py (STORE-04)
- services/storage.py: remove save_upload (D-04); add GREATEST(0, used_bytes-delta) quota decrement to delete_document (STORE-06)
- tasks/document_tasks.py: add cleanup_abandoned_uploads Celery beat task (D-06)
- celery_app.py: add beat_schedule for cleanup-abandoned-uploads every 30 minutes
- tests/test_documents.py: replace legacy /upload tests with xfail; add real test logic for upload-url/confirm/get-quota
- tests/test_quota.py: implement real test logic with xfail for PostgreSQL-specific SQL
2026-05-23 14:32:12 +02:00
curo1305 1882edfff6 feat(02-02): auth API endpoints + security hardening + Python 3.9 compat
- backend/api/auth.py: register, login (TOTP+backup), refresh, logout,
  me, change-password; per-account Redis rate limit; HIBP check
- backend/main.py: Origin validation middleware, CSP headers middleware,
  CORS locked to settings.cors_origins, Redis lifespan (app.state.redis),
  admin bootstrap, auth router included, slowapi SlowAPIMiddleware
- backend/services/email.py: already created in Plan 01 (verified exists)
- Python 3.9 compat: fixed match statement in ai/__init__.py,
  str|None union syntax in openai_provider.py, api/documents.py,
  api/topics.py, api/settings.py, services/classifier.py

All 17 tests in test_auth_api.py pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 19:35:38 +02:00
curo1305 c1931fd566 feat(01-05): wire main.py lifespan+health and rewrite documents+topics to async session
- Rewrite main.py lifespan: MinIO client created at startup, docuvault bucket
  auto-created if missing, stored on app.state.minio; engine.dispose() on shutdown
- Extend /health endpoint: probes PostgreSQL (SELECT 1) and MinIO (bucket_exists)
  returning {"status": "ok"|"degraded", "checks": {"postgres": ..., "minio": ...}}
- Rewrite api/documents.py: all routes inject session: AsyncSession = Depends(get_db);
  save_upload/save_metadata/list_metadata/get_metadata/delete_document all async;
  upload handler queues extract_and_classify.delay() instead of inline classification;
  /classify endpoint retains synchronous await classifier.classify_document() for
  backward-compatible immediate response
- Rewrite api/topics.py: all routes inject session dependency; all storage calls
  are async with session parameter; Pydantic models TopicCreate/TopicUpdate/
  SuggestRequest preserved verbatim
2026-05-22 09:47:00 +02:00
curo1305 7a34807fa0 chore: initial commit — existing single-user document scanner codebase
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 08:53:28 +02:00