--- status: investigating trigger: "Documents stored on cloud backend cannot be opened, re-analyzed, or edited" created: 2026-05-30T00:00:00Z updated: 2026-05-30T00:00:00Z symptoms_prefilled: true goal: find_root_cause_only --- ## Current Focus hypothesis: "CONFIRMED — three independent root causes across open, re-analyze, and edit flows" test: "Full read of documents.py, document_tasks.py, DocumentPreviewModal.vue, client.js" expecting: "Three separate bugs identified with specific mechanisms" next_action: "return root cause findings" ## Symptoms expected: "Opening, re-analyzing, and editing a document stored on a cloud backend should work correctly via the backend proxy" actual: "User cannot open, re-analyze, or edit any file stored on a cloud backend" errors: "None specifically reported, but likely HTTP errors or missing endpoints" reproduction: "Test 13 in Phase 5 UAT — after uploading a document to a connected Nextcloud/WebDAV backend, all document operations (open, re-analyze, edit) fail" started: "Discovered during UAT of Phase 5 (cloud storage backends)" ## Eliminated - hypothesis: "GET /api/documents/{id}/content endpoint missing cloud branch" evidence: "The endpoint calls get_storage_backend_for_document() which correctly dispatches to NextcloudBackend/WebDAVBackend based on doc.storage_backend — the backend proxy path is implemented" timestamp: 2026-05-30T00:00:00Z ## Evidence - timestamp: 2026-05-30T00:00:00Z checked: "DocumentPreviewModal.vue — how it opens documents" found: "Uses raw iframe :src pointing to /api/documents/{id}/content — this is a browser navigation, NOT a fetch() call, so the Authorization: Bearer header is never sent" implication: "The backend /content endpoint uses get_regular_user dep which requires a JWT Bearer token. An iframe or window.open() GET has no Authorization header → 401 Unauthorized → document cannot be opened" - timestamp: 2026-05-30T00:00:00Z checked: "backend/tasks/document_tasks.py _run() function — the re-analyze (extract_and_classify) Celery task" found: "Line 64: backend = get_storage_backend() — this always returns MinIOBackend regardless of doc.storage_backend. For cloud documents, get_storage_backend_for_document() must be called but the Celery task has no User or Session context to look up CloudConnection credentials" implication: "Re-analysis of a cloud-stored document fails: the task calls MinIO get_object() with a WebDAV path (e.g. 'docuvault/user-id/doc-id.pdf') which does not exist in MinIO → MinIO retrieval error → extract_and_classify returns status='extract_failed'" - timestamp: 2026-05-30T00:00:00Z checked: "backend/api/documents.py — full route list via @router decorator scan" found: "Only these routes exist: POST /upload-url, POST /upload, POST /{id}/confirm, GET /, GET /{id}, DELETE /{id}, POST /{id}/classify, GET /{id}/content. There is NO PATCH or PUT endpoint for editing document metadata (filename, folder, etc.) on cloud documents." implication: "The 'edit' failure may refer to the classify endpoint (re-analyze) or to a missing document-rename/metadata-update endpoint. The classify endpoint itself works correctly for cloud docs (it uses cached extracted_text, not re-fetching from storage), but re-extraction does not." - timestamp: 2026-05-30T00:00:00Z checked: "DocumentView.vue — how openPdf() works and how it uses the content URL" found: "openPdf() either calls window.open(api.getDocumentContentUrl(doc.value.id), '_blank') or shows DocumentPreviewModal. Both result in unauthenticated browser requests with no Bearer token." implication: "Both open paths (new tab and in-app preview) hit the /content endpoint without auth → 401 for all documents, not just cloud ones. However cloud documents additionally require credentials decryption, so they would fail even if the auth issue were solved." - timestamp: 2026-05-30T00:00:00Z checked: "client.js getDocumentContentUrl — returns a plain URL string, never does a credentialed fetch" found: "Function returns '/api/documents/{id}/content' as a plain string for use in iframe src or window.open(). No fetch() with Authorization header." implication: "The content endpoint is auth-protected (get_regular_user dep) but the frontend uses unauthenticated browser navigation to reach it — the 401 response is the actual error the user sees for any document, but for cloud documents there is an additional issue in the Celery worker" ## Resolution root_cause: | Three independent root causes: 1. OPEN (401 auth): The /api/documents/{id}/content endpoint requires a JWT Bearer token (get_regular_user dep), but DocumentPreviewModal and DocumentView both access it via iframe src or window.open() — browser navigations that send no Authorization header. All documents fail to open, but cloud documents are additionally impacted. 2. RE-ANALYZE (wrong backend): The extract_and_classify Celery task hardcodes get_storage_backend() (always MinIO) at line 64 of document_tasks.py. For cloud-stored documents it should call get_storage_backend_for_document(), but the Celery task has no User ORM instance and no CloudConnection lookup mechanism. The task reads doc.storage_backend but does nothing with it — it always fetches from MinIO, which 404s on a WebDAV path. 3. EDIT (endpoint missing): There is no PATCH endpoint for updating document metadata (filename/title). The user's "edit" likely refers to the re-analyze/re-extract operation or to metadata editing, neither of which works for cloud docs. fix: verification: files_changed: []