Files
curo1305 add654444e docs(05-04): complete WebDAVBackend + NextcloudBackend plan — SUMMARY and STATE
- 05-04-SUMMARY.md: 2 tasks (31 tests, 4 files), 8 min, 1 auto-fixed deviation (factory dispatch)
- STATE.md: plan advanced to 4/8, session log updated, 3 new key decisions recorded
2026-05-28 21:15:12 +02:00

9.0 KiB

phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, key-decisions, patterns-established, requirements-completed, duration, completed
phase plan subsystem tags requires provides affects tech-stack key-files key-decisions patterns-established requirements-completed duration completed
05-cloud-storage-backends 04 api
webdav
nextcloud
webdavclient3
ssrf
asyncio
storage-backend
cloud-storage
phase plan provides
05-cloud-storage-backends 02 cloud_utils.py: validate_cloud_url (SSRF guard), encrypt/decrypt_credentials (HKDF+Fernet), storage factory
backend/storage/webdav_backend.py: WebDAVBackend — generic WebDAV StorageBackend (all 7 methods)
backend/storage/nextcloud_backend.py: NextcloudBackend — Nextcloud-specific extension with list_folder
backend/storage/__init__.py: nextcloud dispatch correctly routes to NextcloudBackend
05-05
05-06
05-07
05-08
added patterns
webdavclient3 3.14.7 — synchronous WebDAV client; all calls wrapped in asyncio.to_thread()
lxml 6.1.1 — dependency of webdavclient3 (auto-installed)
SSRF double guard: validate_cloud_url() in __init__ (construct-time) AND before every asyncio.to_thread() call (request-time)
asyncio.to_thread() for all sync webdavclient3 calls — mirrors MinIOBackend pattern
urllib.parse.quote() on path segments for WebDAV/Nextcloud compatibility (RESEARCH.md Pitfall 2)
WebDAVBackend subclassing: NextcloudBackend inherits all 7 ABC methods, only overrides health_check
list_folder returns [{id, name, is_dir, size}] — consumed by cloud folder listing API endpoint
created modified
backend/storage/webdav_backend.py
backend/storage/nextcloud_backend.py
backend/tests/test_webdav_backend.py
backend/storage/__init__.py
validate_cloud_url called in __init__ AND before every asyncio.to_thread() — defence-in-depth against DNS rebinding (D-17 / T-05-04-02)
webdavclient3 upload_to/download_from method names confirmed by runtime inspection — match RESEARCH.md ASSUMPTION A1
nextcloud and webdav providers dispatch to different classes: NextcloudBackend vs WebDAVBackend
list_folder calls validate_cloud_url before each client.info() call in the item loop (every outbound request)
WebDAVBackend._make_path: 'docuvault/{encoded_user_id}/{encoded_doc_id}{ext}' — object_key = WebDAV path
presigned methods raise NotImplementedError on all cloud backends (D-14)
delete_object catches all exceptions silently — no-op semantics per StorageBackend ABC contract
NextcloudBackend.health_check uses client.check('') vs WebDAVBackend.health_check client.check('/')
CLOUD-01
CLOUD-07
8min 2026-05-28

Phase 5 Plan 04: WebDAVBackend and NextcloudBackend Summary

Generic WebDAV and Nextcloud StorageBackend implementations with SSRF double-guard (construct-time + per-request), asyncio.to_thread() wrapping for all sync webdavclient3 calls, and NextcloudBackend list_folder for lazy-load folder tree

Performance

  • Duration: 8 min
  • Started: 2026-05-28T19:05:47Z
  • Completed: 2026-05-28T19:13:35Z
  • Tasks: 2 (+ 1 RED phase commit)
  • Files modified: 4

Accomplishments

  • Created webdav_backend.py with WebDAVBackend implementing all 7 StorageBackend abstract methods; validate_cloud_url() called in __init__ (SSRF at construct time) and before every asyncio.to_thread() call (D-17 defence-in-depth / T-05-04-01, T-05-04-02)
  • Created nextcloud_backend.py with NextcloudBackend(WebDAVBackend) inheriting all 7 methods; added list_folder() async method returning [{id, name, is_dir, size}] dicts for the lazy-load cloud folder tree API; overrides health_check to use client.check("") for Nextcloud root
  • Confirmed webdavclient3 actual method names by runtime inspection (upload_to, download_from — RESEARCH.md ASSUMPTION A1 was correct)
  • Created 31 TDD tests covering: subclassing invariants, all 7 methods async, SSRF guard for multiple private IP ranges, NotImplementedError for presigned methods, _make_path path construction and percent-encoding, NextcloudBackend subclass, list_folder presence, inherited SSRF guard
  • Fixed storage factory __init__.py to dispatch nextcloud provider to NextcloudBackend and webdav to WebDAVBackend (both with identical constructor signatures)

Task Commits

  1. RED phase testsc406ab1 (test)
  2. Task 1: WebDAVBackend311dfa1 (feat)
  3. Task 2: NextcloudBackend1b9573f (feat)
  4. Storage factory fixa9ea33d (feat)

Files Created/Modified

  • /Users/nik/Documents/Progamming/document_scanner/backend/storage/webdav_backend.py — WebDAVBackend: all 7 async methods, SSRF guard, asyncio.to_thread() wrapping, _make_path with URL encoding
  • /Users/nik/Documents/Progamming/document_scanner/backend/storage/nextcloud_backend.py — NextcloudBackend: inherits WebDAVBackend, adds list_folder(), overrides health_check()
  • /Users/nik/Documents/Progamming/document_scanner/backend/tests/test_webdav_backend.py — 31 tests (TDD: RED → GREEN)
  • /Users/nik/Documents/Progamming/document_scanner/backend/storage/__init__.py — Fixed nextcloud/webdav provider dispatch in get_storage_backend_for_document()

Decisions Made

  • validate_cloud_url() is called inside list_folder() before client.list() AND before each client.info() in the item loop — every outbound HTTP request is guarded, not just the first in the loop.
  • webdavclient3 method names upload_to(buf, remote_path) and download_from(buf, remote_path) were confirmed by runtime dir(Client) inspection before use. RESEARCH.md ASSUMPTION A1 was accurate.
  • nextcloud and webdav now dispatch to distinct classes so list_folder is available on Nextcloud connections but not on generic WebDAV connections (which don't have a standardised folder listing path convention).
  • client.mkdir(parent_dir, recursive=True) called in put_object before upload — idempotent; webdavclient3 mkdir is a no-op if directory already exists.

Deviations from Plan

Auto-fixed Issues

1. [Rule 2 - Missing Critical] Storage factory dispatching nextcloud to WebDAVBackend instead of NextcloudBackend

  • Found during: Post-Task 2 review of storage/__init__.py
  • Issue: The Plan 02 factory combined nextcloud and webdav into a single dispatch arm both returning WebDAVBackend. This meant Nextcloud connections would not have list_folder, which is the key capability that distinguishes NextcloudBackend from WebDAVBackend and is required for the cloud folder tree API.
  • Fix: Split the dispatch: nextcloudNextcloudBackend, webdavWebDAVBackend; both use identical constructor signatures so the fix is a one-line change per arm.
  • Files modified: backend/storage/__init__.py
  • Verification: Full test suite passes (262 passed); factory module imports correctly.
  • Committed in: a9ea33d

Total deviations: 1 auto-fixed (Rule 2 — missing critical dispatch differentiation) Impact on plan: No scope creep. The fix restores intended behaviour already implied by the plan's decision to use two distinct classes.

Issues Encountered

webdavclient3 was not installed locally — installed via pip3 install webdavclient3 (as expected for a new dependency added in Plan 05-01 requirements.txt). This is consistent with the Pattern established in Plan 05-02 SUMMARY.

Known Stubs

None. Both backends implement all 7 StorageBackend methods without stubs or placeholder returns. presigned_get_url and generate_presigned_put_url raise NotImplementedError by design (D-14).

Threat Surface Scan

No new network endpoints introduced. Both backends are internal SDK wrappers:

  • All outbound WebDAV HTTP calls flow through webdavclient3 SDK, not new FastAPI routes
  • SSRF guard (validate_cloud_url) is called at construct-time and before every outbound call — T-05-04-01 and T-05-04-02 mitigated
  • No new trust boundaries created

No threat flags raised.

Next Phase Readiness

  • Plans 05-03 (Google Drive), 05-05 (OneDrive) can import from storage.webdav_backend and storage.nextcloud_backend immediately
  • get_storage_backend_for_document() now correctly dispatches all 4 providers (minio, google_drive stub, onedrive stub, nextcloud, webdav)
  • The 31 new tests are green; the 43 xfail stubs in test_cloud.py remain xfail (correctly — they test API endpoints not yet built)
  • Full suite: 262 passed / 43 xfailed / 1 pre-existing failure (test_extract_docx — python-docx not installed locally)

Self-Check: PASSED

Files verified present:

  • backend/storage/webdav_backend.py: FOUND (class WebDAVBackend, all 7 async methods)
  • backend/storage/nextcloud_backend.py: FOUND (class NextcloudBackend, list_folder)
  • backend/tests/test_webdav_backend.py: FOUND (31 tests, all passing)
  • backend/storage/__init__.py: FOUND (updated nextcloud/webdav dispatch)

Commits verified:

  • c406ab1: test(05-04) — RED tests — FOUND
  • 311dfa1: feat(05-04) — WebDAVBackend — FOUND
  • 1b9573f: feat(05-04) — NextcloudBackend — FOUND
  • a9ea33d: feat(05-04) — factory fix — FOUND

Phase: 05-cloud-storage-backends Completed: 2026-05-28