---
status: investigating
trigger: "Documents stored on cloud backend cannot be opened, re-analyzed, or edited"
created: 2026-05-30T00:00:00Z
updated: 2026-05-30T00:00:00Z
symptoms_prefilled: true
goal: find_root_cause_only
---

## Current Focus

hypothesis: "CONFIRMED — three independent root causes across open, re-analyze, and edit flows"
test: "Full read of documents.py, document_tasks.py, DocumentPreviewModal.vue, client.js"
expecting: "Three separate bugs identified with specific mechanisms"
next_action: "return root cause findings"

## Symptoms

expected: "Opening, re-analyzing, and editing a document stored on a cloud backend should work correctly via the backend proxy"
actual: "User cannot open, re-analyze, or edit any file stored on a cloud backend"
errors: "None specifically reported, but likely HTTP errors or missing endpoints"
reproduction: "Test 13 in Phase 5 UAT — after uploading a document to a connected Nextcloud/WebDAV backend, all document operations (open, re-analyze, edit) fail"
started: "Discovered during UAT of Phase 5 (cloud storage backends)"

## Eliminated

- hypothesis: "GET /api/documents/{id}/content endpoint missing cloud branch"
  evidence: "The endpoint calls get_storage_backend_for_document() which correctly dispatches to NextcloudBackend/WebDAVBackend based on doc.storage_backend — the backend proxy path is implemented"
  timestamp: 2026-05-30T00:00:00Z

## Evidence

- timestamp: 2026-05-30T00:00:00Z
  checked: "DocumentPreviewModal.vue — how it opens documents"
  found: "Uses raw iframe :src pointing to /api/documents/{id}/content — this is a browser navigation, NOT a fetch() call, so the Authorization: Bearer header is never sent"
  implication: "The backend /content endpoint uses get_regular_user dep which requires a JWT Bearer token. An iframe or window.open() GET has no Authorization header → 401 Unauthorized → document cannot be opened"

- timestamp: 2026-05-30T00:00:00Z
  checked: "backend/tasks/document_tasks.py _run() function — the re-analyze (extract_and_classify) Celery task"
  found: "Line 64: backend = get_storage_backend() — this always returns MinIOBackend regardless of doc.storage_backend. For cloud documents, get_storage_backend_for_document() must be called but the Celery task has no User or Session context to look up CloudConnection credentials"
  implication: "Re-analysis of a cloud-stored document fails: the task calls MinIO get_object() with a WebDAV path (e.g. 'docuvault/user-id/doc-id.pdf') which does not exist in MinIO → MinIO retrieval error → extract_and_classify returns status='extract_failed'"

- timestamp: 2026-05-30T00:00:00Z
  checked: "backend/api/documents.py — full route list via @router decorator scan"
  found: "Only these routes exist: POST /upload-url, POST /upload, POST /{id}/confirm, GET /, GET /{id}, DELETE /{id}, POST /{id}/classify, GET /{id}/content. There is NO PATCH or PUT endpoint for editing document metadata (filename, folder, etc.) on cloud documents."
  implication: "The 'edit' failure may refer to the classify endpoint (re-analyze) or to a missing document-rename/metadata-update endpoint. The classify endpoint itself works correctly for cloud docs (it uses cached extracted_text, not re-fetching from storage), but re-extraction does not."

- timestamp: 2026-05-30T00:00:00Z
  checked: "DocumentView.vue — how openPdf() works and how it uses the content URL"
  found: "openPdf() either calls window.open(api.getDocumentContentUrl(doc.value.id), '_blank') or shows DocumentPreviewModal. Both result in unauthenticated browser requests with no Bearer token."
  implication: "Both open paths (new tab and in-app preview) hit the /content endpoint without auth → 401 for all documents, not just cloud ones. However cloud documents additionally require credentials decryption, so they would fail even if the auth issue were solved."

- timestamp: 2026-05-30T00:00:00Z
  checked: "client.js getDocumentContentUrl — returns a plain URL string, never does a credentialed fetch"
  found: "Function returns '/api/documents/{id}/content' as a plain string for use in iframe src or window.open(). No fetch() with Authorization header."
  implication: "The content endpoint is auth-protected (get_regular_user dep) but the frontend uses unauthenticated browser navigation to reach it — the 401 response is the actual error the user sees for any document, but for cloud documents there is an additional issue in the Celery worker"

## Resolution

root_cause: |
  Three independent root causes:
  1. OPEN (401 auth): The /api/documents/{id}/content endpoint requires a JWT Bearer token (get_regular_user dep), but DocumentPreviewModal and DocumentView both access it via iframe src or window.open() — browser navigations that send no Authorization header. All documents fail to open, but cloud documents are additionally impacted.
  2. RE-ANALYZE (wrong backend): The extract_and_classify Celery task hardcodes get_storage_backend() (always MinIO) at line 64 of document_tasks.py. For cloud-stored documents it should call get_storage_backend_for_document(), but the Celery task has no User ORM instance and no CloudConnection lookup mechanism. The task reads doc.storage_backend but does nothing with it — it always fetches from MinIO, which 404s on a WebDAV path.
  3. EDIT (endpoint missing): There is no PATCH endpoint for updating document metadata (filename/title). The user's "edit" likely refers to the re-analyze/re-extract operation or to metadata editing, neither of which works for cloud docs.
fix:
verification:
files_changed: []