Files
2026-05-28 19:43:12 +02:00

18 KiB

phase, plan, type, wave, depends_on, files_modified, autonomous, requirements, must_haves
phase plan type wave depends_on files_modified autonomous requirements must_haves
05-cloud-storage-backends 04 execute 3
05-02
backend/storage/nextcloud_backend.py
backend/storage/webdav_backend.py
true
CLOUD-01
CLOUD-07
truths artifacts key_links
NextcloudBackend implements all 7 StorageBackend abstract methods
WebDAVBackend implements all 7 StorageBackend abstract methods
validate_cloud_url() called inside WebDAVBackend and NextcloudBackend before every outbound WebDAV request
All sync webdavclient3 calls wrapped in asyncio.to_thread()
generate_presigned_put_url and presigned_get_url raise NotImplementedError on both WebDAV backends
health_check uses lightweight PROPFIND or check() call to validate connectivity without storing unverified credentials
path provides contains
backend/storage/nextcloud_backend.py Nextcloud WebDAV StorageBackend class NextcloudBackend
path provides contains
backend/storage/webdav_backend.py Generic WebDAV StorageBackend class WebDAVBackend
from to via pattern
backend/storage/nextcloud_backend.py backend/storage/cloud_utils.py validate_cloud_url called before every outbound request validate_cloud_url
from to via pattern
backend/storage/webdav_backend.py backend/storage/cloud_utils.py validate_cloud_url called before every outbound request validate_cloud_url
Implement NextcloudBackend and WebDAVBackend — the two credential-based (non-OAuth) cloud StorageBackend concrete classes.

Purpose: These backends handle Nextcloud and generic WebDAV servers using HTTP Basic Auth. SSRF prevention via validate_cloud_url() is mandatory before every outbound request. All sync webdavclient3 calls are wrapped in asyncio.to_thread() per the MinIOBackend pattern. Output: nextcloud_backend.py and webdav_backend.py, each implementing all 7 StorageBackend methods.

<execution_context> @/Users/nik/.claude/get-shit-done/workflows/execute-plan.md @/Users/nik/.claude/get-shit-done/templates/summary.md </execution_context>

@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/phases/05-cloud-storage-backends/05-CONTEXT.md @.planning/phases/05-cloud-storage-backends/05-RESEARCH.md @.planning/phases/05-cloud-storage-backends/05-02-SUMMARY.md From backend/storage/base.py: class StorageBackend(ABC): async def put_object(user_id, document_id, file_bytes, extension, content_type) -> str async def get_object(object_key: str) -> bytes async def delete_object(object_key: str) -> None async def presigned_get_url(object_key: str, expires_minutes: int = 60) -> str async def health_check() -> bool async def generate_presigned_put_url(object_key: str, expires_minutes: int = 15) -> str async def stat_object(object_key: str) -> int

webdavclient3 Client options: {"webdav_hostname": server_url, "webdav_login": username, "webdav_password": password} All webdavclient3 calls are synchronous — MUST wrap in asyncio.to_thread() Method names to verify: client.upload_to(buf, remote_path), client.download_from(buf, remote_path) client.list(remote_dir), client.info(remote_path) returns dict with "size" key client.check(remote_path) returns bool — used for health_check client.clean(remote_path) — delete ASSUMPTION A1: verify upload_to/download_from method names against installed package during implementation

validate_cloud_url(url: str) -> None — raises ValueError if URL targets private/internal address Must be called: (1) at connect-time, (2) before every outbound WebDAV request

Use urllib.parse.quote() on path segments for Nextcloud compatibility with non-ASCII filenames

object_key = WebDAV path: "docuvault/{user_id}/{document_id}{extension}" CloudConnection credentials dict: {"server_url": str, "username": str, "password": str}

Task 1: Implement WebDAVBackend backend/storage/webdav_backend.py - backend/storage/base.py — all 7 method signatures - backend/storage/minio_backend.py — asyncio.to_thread() wrapping pattern - backend/storage/cloud_utils.py — validate_cloud_url signature - .planning/phases/05-cloud-storage-backends/05-RESEARCH.md — Pattern 5 (webdavclient3), Pitfall 2 (path encoding), A1 (assumed method names) - WebDAVBackend.__init__(self, server_url: str, username: str, password: str) creates webdavclient3 Client - validate_cloud_url(server_url) called in __init__ before constructing the client (SSRF guard at construct time) - put_object: constructs object_key = f"docuvault/{user_id}/{document_id}{extension}"; percent-encodes path segments; uploads via asyncio.to_thread; returns object_key - get_object: downloads to BytesIO via asyncio.to_thread; returns bytes - delete_object: deletes via asyncio.to_thread; catches FileNotFoundError / WebDavException for missing file (no-op) - presigned_get_url: raises NotImplementedError - generate_presigned_put_url: raises NotImplementedError - stat_object: calls asyncio.to_thread for client.info(object_key); returns int(info.get("size", 0)) - health_check: calls asyncio.to_thread for client.check("/"); returns True/False - SSRF validation called before every asyncio.to_thread call: validate_cloud_url(self._server_url) - Uses urllib.parse.quote on non-docuvault path segments (Pitfall 2) Create backend/storage/webdav_backend.py with:
Module docstring explaining WebDAV backend, SSRF validation requirement per D-17, and Pitfall 2 (path encoding).

from __future__ import annotations
import asyncio, io, urllib.parse
from webdav3.client import Client
from storage.base import StorageBackend
from storage.cloud_utils import validate_cloud_url

class WebDAVBackend(StorageBackend):

  def __init__(self, server_url: str, username: str, password: str) -> None:
    validate_cloud_url(server_url)  # SSRF guard at construct time
    self._server_url = server_url
    options = {
      "webdav_hostname": server_url,
      "webdav_login": username,
      "webdav_password": password,
    }
    self._client = Client(options)

  def _make_path(self, user_id: str, document_id: str, extension: str) -> str:
    # Construct path with percent-encoding for Nextcloud/WebDAV compatibility (Pitfall 2)
    encoded_uid = urllib.parse.quote(str(user_id), safe="")
    encoded_did = urllib.parse.quote(str(document_id), safe="")
    return f"docuvault/{encoded_uid}/{encoded_did}{extension}"

  async def put_object(self, user_id, document_id, file_bytes, extension, content_type) -> str:
    validate_cloud_url(self._server_url)  # re-validate before every request (D-17)
    object_key = self._make_path(user_id, document_id, extension)
    buf = io.BytesIO(file_bytes)
    # Ensure parent directory exists: client.mkdir("docuvault/{user_id}/") wrapped in asyncio.to_thread
    # Then: await asyncio.to_thread(self._client.upload_to, buf, object_key)
    # If upload_to method name incorrect, verify against webdavclient3 docs and use correct name
    return object_key

  async def get_object(self, object_key: str) -> bytes:
    validate_cloud_url(self._server_url)
    buf = io.BytesIO()
    await asyncio.to_thread(self._client.download_from, buf, object_key)
    return buf.getvalue()

  async def delete_object(self, object_key: str) -> None:
    validate_cloud_url(self._server_url)
    try:
      await asyncio.to_thread(self._client.clean, object_key)
    except Exception:
      pass  # No-op if file not found

  async def presigned_get_url(self, object_key: str, expires_minutes: int = 60) -> str:
    raise NotImplementedError("WebDAV backend does not support presigned URLs")

  async def generate_presigned_put_url(self, object_key: str, expires_minutes: int = 15) -> str:
    raise NotImplementedError("WebDAV backend does not support presigned put URLs")

  async def stat_object(self, object_key: str) -> int:
    validate_cloud_url(self._server_url)
    info = await asyncio.to_thread(self._client.info, object_key)
    return int(info.get("size", 0))

  async def health_check(self) -> bool:
    try:
      validate_cloud_url(self._server_url)
      result = await asyncio.to_thread(self._client.check, "/")
      return bool(result)
    except Exception:
      return False

IMPORTANT: During implementation, verify the webdavclient3 method names by running:
python -c "from webdav3.client import Client; print([m for m in dir(Client) if not m.startswith('_')])"
and use the correct method names. The RESEARCH.md marks upload_to/download_from as [ASSUMED].
Correct method names if different (e.g., may be upload_sync, download_sync, or upload/download).
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c " from storage.webdav_backend import WebDAVBackend import inspect for method in ['put_object','get_object','delete_object','presigned_get_url','health_check','generate_presigned_put_url','stat_object']: assert inspect.iscoroutinefunction(getattr(WebDAVBackend, method)), f'{method} not async' # SSRF guard: connecting to localhost should raise ValueError try: WebDAVBackend('http://localhost/dav', 'user', 'pass') print('FAIL: should raise ValueError for localhost') except ValueError: print('OK: SSRF blocked in __init__') print('All methods async: OK') import asyncio backend = WebDAVBackend.__new__(WebDAVBackend) backend._server_url = 'https://example.com/dav' # bypass __init__ for method check async def check(): try: await backend.presigned_get_url('k') except NotImplementedError: print('presigned_get_url NotImplementedError: OK') try: await backend.generate_presigned_put_url('k') except NotImplementedError: print('generate_presigned_put_url NotImplementedError: OK') asyncio.run(check()) " - backend/storage/webdav_backend.py exists with class WebDAVBackend - All 7 methods are async coroutines - WebDAVBackend("http://127.0.0.1/dav", "u", "p") raises ValueError (SSRF guard in __init__) - presigned_get_url and generate_presigned_put_url raise NotImplementedError - validate_cloud_url imported and called in __init__ and before every asyncio.to_thread call - `pytest -v --tb=short` exits 0 WebDAVBackend created; SSRF validation in __init__ and before each request; all 7 methods async; pytest passes Task 2: Implement NextcloudBackend backend/storage/nextcloud_backend.py - backend/storage/webdav_backend.py — WebDAVBackend implementation (NextcloudBackend extends it) - .planning/phases/05-cloud-storage-backends/05-RESEARCH.md — Open Question 2 (Nextcloud folder listing path convention), Pitfall 2 (path encoding) - backend/storage/cloud_utils.py — validate_cloud_url - NextcloudBackend subclasses WebDAVBackend — inherits all 7 methods; only overrides what differs - NextcloudBackend stores the username for folder listing path construction (Nextcloud WebDAV path: /remote.php/dav/files/{username}/) - SSRF validation inherited from WebDAVBackend parent class - list_folder(folder_path: str) -> list[dict] method added for cloud folder listing via PROPFIND (used by API) - list_folder returns list of dicts with keys: id (str path), name (str), is_dir (bool), size (int) - get_object and put_object inherited from WebDAVBackend - health_check overrides parent to use PROPFIND on the Nextcloud root path Create backend/storage/nextcloud_backend.py with:
Module docstring explaining Nextcloud extends WebDAVBackend; Nextcloud WebDAV base path convention.

from __future__ import annotations
import asyncio, urllib.parse
from storage.webdav_backend import WebDAVBackend
from storage.cloud_utils import validate_cloud_url

class NextcloudBackend(WebDAVBackend):
  """Nextcloud storage backend — extends WebDAVBackend with Nextcloud-specific path handling.

  The server_url should be the full WebDAV root:
  https://nc.example.com/remote.php/dav/files/{username}/
  """

  def __init__(self, server_url: str, username: str, password: str) -> None:
    super().__init__(server_url, username, password)
    self._username = username

  async def list_folder(self, folder_path: str = "") -> list[dict]:
    """List folder contents at folder_path relative to WebDAV root.

    Returns a list of dicts: [{"id": str, "name": str, "is_dir": bool, "size": int}, ...]
    Used by GET /api/cloud/folders/nextcloud/{folder_id} endpoint.
    """
    validate_cloud_url(self._server_url)
    # List the folder using client.list() which returns a list of file names
    # For each item, call client.info() to get size and type
    # Wrap each client call in asyncio.to_thread
    # Return structured list

  async def health_check(self) -> bool:
    try:
      validate_cloud_url(self._server_url)
      # Use client.check("") or client.list("") to verify connectivity to root
      result = await asyncio.to_thread(self._client.check, "")
      return bool(result)
    except Exception:
      return False

NextcloudBackend inherits put_object, get_object, delete_object, presigned_get_url,
generate_presigned_put_url, and stat_object from WebDAVBackend.

The list_folder method is extra (not in ABC) and used exclusively by the cloud folder
listing API endpoint.
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c " from storage.nextcloud_backend import NextcloudBackend from storage.webdav_backend import WebDAVBackend import inspect # Verify subclass assert issubclass(NextcloudBackend, WebDAVBackend), 'NextcloudBackend must subclass WebDAVBackend' # Verify all 7 methods async for method in ['put_object','get_object','delete_object','presigned_get_url','health_check','generate_presigned_put_url','stat_object']: assert inspect.iscoroutinefunction(getattr(NextcloudBackend, method)), f'{method} not async' # Verify list_folder added assert hasattr(NextcloudBackend, 'list_folder'), 'list_folder missing' assert inspect.iscoroutinefunction(NextcloudBackend.list_folder), 'list_folder not async' print('NextcloudBackend is WebDAVBackend subclass: OK') print('All 7 StorageBackend methods async: OK') print('list_folder method present and async: OK') # SSRF guard inherited try: NextcloudBackend('http://10.0.0.1/dav', 'user', 'pass') print('FAIL: SSRF should be blocked') except ValueError: print('SSRF guard inherited: OK') " - backend/storage/nextcloud_backend.py exists with class NextcloudBackend - issubclass(NextcloudBackend, WebDAVBackend) is True - All 7 StorageBackend methods are async (inherited or overridden) - list_folder async method added beyond the ABC contract - SSRF guard inherited from WebDAVBackend.__init__: NextcloudBackend("http://10.0.0.1/dav", ...) raises ValueError - `pytest -v --tb=short` exits 0 NextcloudBackend created as WebDAVBackend subclass; list_folder added; SSRF guard inherited; pytest passes

<threat_model>

Trust Boundaries

Boundary Description
user-supplied server_url → WebDAV client Server URL must be validated for SSRF before Client construction and before each request
webdavclient3 sync calls → event loop All sync SDK calls must be in asyncio.to_thread() to prevent event loop blocking
WebDAV credentials → encrypted storage Credentials flow from encrypted DB via factory into backend constructor — never logged

STRIDE Threat Register

Threat ID Category Component Disposition Mitigation Plan
T-05-04-01 Tampering WebDAVBackend — SSRF via server_url mitigate validate_cloud_url(server_url) in init AND before every asyncio.to_thread call; D-17 requires both points
T-05-04-02 Tampering DNS rebinding on WebDAV requests mitigate validate_cloud_url called before each request (not only at connect-time); documented defense-in-depth via network egress firewall (RESEARCH.md Pitfall 5)
T-05-04-03 Information Disclosure WebDAV path includes user_id/document_id accept object_key = "docuvault/{user_id}/{document_id}{ext}" — no human filename; acceptable for single-user WebDAV servers
T-05-04-04 Denial of Service Nextcloud list_folder fetching info per item accept TTLCache (Plan 02) prevents repeated list_folder calls within 60s; per-item info call is provider overhead only
T-05-04-05 Tampering webdavclient3 path traversal via object_key mitigate put_object constructs object_key from user_id and document_id (both UUID values); get_object/delete_object receive object_key from DB (not from user input directly) — no raw user path injection
</threat_model>
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_cloud.py -v && python -m pytest -v --tb=short 2>&1 | tail -10

<success_criteria>

  • WebDAVBackend: all 7 methods async; validate_cloud_url in init and before each request; presigned methods raise NotImplementedError
  • NextcloudBackend: subclass of WebDAVBackend; list_folder method added; SSRF guard inherited
  • pytest -v exits 0, 0 failures; test_cloud.py still all xfailed </success_criteria>
Create `.planning/phases/05-cloud-storage-backends/05-04-SUMMARY.md` when done