Files
2026-05-28 19:43:12 +02:00

364 lines
18 KiB
Markdown

---
phase: 05-cloud-storage-backends
plan: 04
type: execute
wave: 3
depends_on:
- "05-02"
files_modified:
- backend/storage/nextcloud_backend.py
- backend/storage/webdav_backend.py
autonomous: true
requirements:
- CLOUD-01
- CLOUD-07
must_haves:
truths:
- "NextcloudBackend implements all 7 StorageBackend abstract methods"
- "WebDAVBackend implements all 7 StorageBackend abstract methods"
- "validate_cloud_url() called inside WebDAVBackend and NextcloudBackend before every outbound WebDAV request"
- "All sync webdavclient3 calls wrapped in asyncio.to_thread()"
- "generate_presigned_put_url and presigned_get_url raise NotImplementedError on both WebDAV backends"
- "health_check uses lightweight PROPFIND or check() call to validate connectivity without storing unverified credentials"
artifacts:
- path: "backend/storage/nextcloud_backend.py"
provides: "Nextcloud WebDAV StorageBackend"
contains: "class NextcloudBackend"
- path: "backend/storage/webdav_backend.py"
provides: "Generic WebDAV StorageBackend"
contains: "class WebDAVBackend"
key_links:
- from: "backend/storage/nextcloud_backend.py"
to: "backend/storage/cloud_utils.py"
via: "validate_cloud_url called before every outbound request"
pattern: "validate_cloud_url"
- from: "backend/storage/webdav_backend.py"
to: "backend/storage/cloud_utils.py"
via: "validate_cloud_url called before every outbound request"
pattern: "validate_cloud_url"
---
<objective>
Implement NextcloudBackend and WebDAVBackend — the two credential-based (non-OAuth) cloud StorageBackend concrete classes.
Purpose: These backends handle Nextcloud and generic WebDAV servers using HTTP Basic Auth. SSRF prevention via validate_cloud_url() is mandatory before every outbound request. All sync webdavclient3 calls are wrapped in asyncio.to_thread() per the MinIOBackend pattern.
Output: nextcloud_backend.py and webdav_backend.py, each implementing all 7 StorageBackend methods.
</objective>
<execution_context>
@/Users/nik/.claude/get-shit-done/workflows/execute-plan.md
@/Users/nik/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/phases/05-cloud-storage-backends/05-CONTEXT.md
@.planning/phases/05-cloud-storage-backends/05-RESEARCH.md
@.planning/phases/05-cloud-storage-backends/05-02-SUMMARY.md
</context>
<interfaces>
<!-- From backend/storage/base.py -->
From backend/storage/base.py:
class StorageBackend(ABC):
async def put_object(user_id, document_id, file_bytes, extension, content_type) -> str
async def get_object(object_key: str) -> bytes
async def delete_object(object_key: str) -> None
async def presigned_get_url(object_key: str, expires_minutes: int = 60) -> str
async def health_check() -> bool
async def generate_presigned_put_url(object_key: str, expires_minutes: int = 15) -> str
async def stat_object(object_key: str) -> int
<!-- From RESEARCH.md Pattern 5 — webdavclient3 -->
webdavclient3 Client options: {"webdav_hostname": server_url, "webdav_login": username, "webdav_password": password}
All webdavclient3 calls are synchronous — MUST wrap in asyncio.to_thread()
Method names to verify: client.upload_to(buf, remote_path), client.download_from(buf, remote_path)
client.list(remote_dir), client.info(remote_path) returns dict with "size" key
client.check(remote_path) returns bool — used for health_check
client.clean(remote_path) — delete
ASSUMPTION A1: verify upload_to/download_from method names against installed package during implementation
<!-- From RESEARCH.md Pattern 6 — SSRF prevention -->
validate_cloud_url(url: str) -> None — raises ValueError if URL targets private/internal address
Must be called: (1) at connect-time, (2) before every outbound WebDAV request
<!-- From RESEARCH.md Pitfall 2 — Nextcloud path encoding -->
Use urllib.parse.quote() on path segments for Nextcloud compatibility with non-ASCII filenames
<!-- Object key scheme for WebDAV -->
object_key = WebDAV path: "docuvault/{user_id}/{document_id}{extension}"
CloudConnection credentials dict: {"server_url": str, "username": str, "password": str}
</interfaces>
<tasks>
<task type="auto" tdd="true">
<name>Task 1: Implement WebDAVBackend</name>
<files>backend/storage/webdav_backend.py</files>
<read_first>
- backend/storage/base.py — all 7 method signatures
- backend/storage/minio_backend.py — asyncio.to_thread() wrapping pattern
- backend/storage/cloud_utils.py — validate_cloud_url signature
- .planning/phases/05-cloud-storage-backends/05-RESEARCH.md — Pattern 5 (webdavclient3), Pitfall 2 (path encoding), A1 (assumed method names)
</read_first>
<behavior>
- WebDAVBackend.__init__(self, server_url: str, username: str, password: str) creates webdavclient3 Client
- validate_cloud_url(server_url) called in __init__ before constructing the client (SSRF guard at construct time)
- put_object: constructs object_key = f"docuvault/{user_id}/{document_id}{extension}"; percent-encodes path segments; uploads via asyncio.to_thread; returns object_key
- get_object: downloads to BytesIO via asyncio.to_thread; returns bytes
- delete_object: deletes via asyncio.to_thread; catches FileNotFoundError / WebDavException for missing file (no-op)
- presigned_get_url: raises NotImplementedError
- generate_presigned_put_url: raises NotImplementedError
- stat_object: calls asyncio.to_thread for client.info(object_key); returns int(info.get("size", 0))
- health_check: calls asyncio.to_thread for client.check("/"); returns True/False
- SSRF validation called before every asyncio.to_thread call: validate_cloud_url(self._server_url)
- Uses urllib.parse.quote on non-docuvault path segments (Pitfall 2)
</behavior>
<action>
Create backend/storage/webdav_backend.py with:
Module docstring explaining WebDAV backend, SSRF validation requirement per D-17, and Pitfall 2 (path encoding).
from __future__ import annotations
import asyncio, io, urllib.parse
from webdav3.client import Client
from storage.base import StorageBackend
from storage.cloud_utils import validate_cloud_url
class WebDAVBackend(StorageBackend):
def __init__(self, server_url: str, username: str, password: str) -> None:
validate_cloud_url(server_url) # SSRF guard at construct time
self._server_url = server_url
options = {
"webdav_hostname": server_url,
"webdav_login": username,
"webdav_password": password,
}
self._client = Client(options)
def _make_path(self, user_id: str, document_id: str, extension: str) -> str:
# Construct path with percent-encoding for Nextcloud/WebDAV compatibility (Pitfall 2)
encoded_uid = urllib.parse.quote(str(user_id), safe="")
encoded_did = urllib.parse.quote(str(document_id), safe="")
return f"docuvault/{encoded_uid}/{encoded_did}{extension}"
async def put_object(self, user_id, document_id, file_bytes, extension, content_type) -> str:
validate_cloud_url(self._server_url) # re-validate before every request (D-17)
object_key = self._make_path(user_id, document_id, extension)
buf = io.BytesIO(file_bytes)
# Ensure parent directory exists: client.mkdir("docuvault/{user_id}/") wrapped in asyncio.to_thread
# Then: await asyncio.to_thread(self._client.upload_to, buf, object_key)
# If upload_to method name incorrect, verify against webdavclient3 docs and use correct name
return object_key
async def get_object(self, object_key: str) -> bytes:
validate_cloud_url(self._server_url)
buf = io.BytesIO()
await asyncio.to_thread(self._client.download_from, buf, object_key)
return buf.getvalue()
async def delete_object(self, object_key: str) -> None:
validate_cloud_url(self._server_url)
try:
await asyncio.to_thread(self._client.clean, object_key)
except Exception:
pass # No-op if file not found
async def presigned_get_url(self, object_key: str, expires_minutes: int = 60) -> str:
raise NotImplementedError("WebDAV backend does not support presigned URLs")
async def generate_presigned_put_url(self, object_key: str, expires_minutes: int = 15) -> str:
raise NotImplementedError("WebDAV backend does not support presigned put URLs")
async def stat_object(self, object_key: str) -> int:
validate_cloud_url(self._server_url)
info = await asyncio.to_thread(self._client.info, object_key)
return int(info.get("size", 0))
async def health_check(self) -> bool:
try:
validate_cloud_url(self._server_url)
result = await asyncio.to_thread(self._client.check, "/")
return bool(result)
except Exception:
return False
IMPORTANT: During implementation, verify the webdavclient3 method names by running:
python -c "from webdav3.client import Client; print([m for m in dir(Client) if not m.startswith('_')])"
and use the correct method names. The RESEARCH.md marks upload_to/download_from as [ASSUMED].
Correct method names if different (e.g., may be upload_sync, download_sync, or upload/download).
</action>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c "
from storage.webdav_backend import WebDAVBackend
import inspect
for method in ['put_object','get_object','delete_object','presigned_get_url','health_check','generate_presigned_put_url','stat_object']:
assert inspect.iscoroutinefunction(getattr(WebDAVBackend, method)), f'{method} not async'
# SSRF guard: connecting to localhost should raise ValueError
try:
WebDAVBackend('http://localhost/dav', 'user', 'pass')
print('FAIL: should raise ValueError for localhost')
except ValueError:
print('OK: SSRF blocked in __init__')
print('All methods async: OK')
import asyncio
backend = WebDAVBackend.__new__(WebDAVBackend)
backend._server_url = 'https://example.com/dav' # bypass __init__ for method check
async def check():
try: await backend.presigned_get_url('k')
except NotImplementedError: print('presigned_get_url NotImplementedError: OK')
try: await backend.generate_presigned_put_url('k')
except NotImplementedError: print('generate_presigned_put_url NotImplementedError: OK')
asyncio.run(check())
"</automated>
</verify>
<acceptance_criteria>
- backend/storage/webdav_backend.py exists with class WebDAVBackend
- All 7 methods are async coroutines
- WebDAVBackend("http://127.0.0.1/dav", "u", "p") raises ValueError (SSRF guard in __init__)
- presigned_get_url and generate_presigned_put_url raise NotImplementedError
- validate_cloud_url imported and called in __init__ and before every asyncio.to_thread call
- `pytest -v --tb=short` exits 0
</acceptance_criteria>
<done>WebDAVBackend created; SSRF validation in __init__ and before each request; all 7 methods async; pytest passes</done>
</task>
<task type="auto" tdd="true">
<name>Task 2: Implement NextcloudBackend</name>
<files>backend/storage/nextcloud_backend.py</files>
<read_first>
- backend/storage/webdav_backend.py — WebDAVBackend implementation (NextcloudBackend extends it)
- .planning/phases/05-cloud-storage-backends/05-RESEARCH.md — Open Question 2 (Nextcloud folder listing path convention), Pitfall 2 (path encoding)
- backend/storage/cloud_utils.py — validate_cloud_url
</read_first>
<behavior>
- NextcloudBackend subclasses WebDAVBackend — inherits all 7 methods; only overrides what differs
- NextcloudBackend stores the username for folder listing path construction (Nextcloud WebDAV path: /remote.php/dav/files/{username}/)
- SSRF validation inherited from WebDAVBackend parent class
- list_folder(folder_path: str) -> list[dict] method added for cloud folder listing via PROPFIND (used by API)
- list_folder returns list of dicts with keys: id (str path), name (str), is_dir (bool), size (int)
- get_object and put_object inherited from WebDAVBackend
- health_check overrides parent to use PROPFIND on the Nextcloud root path
</behavior>
<action>
Create backend/storage/nextcloud_backend.py with:
Module docstring explaining Nextcloud extends WebDAVBackend; Nextcloud WebDAV base path convention.
from __future__ import annotations
import asyncio, urllib.parse
from storage.webdav_backend import WebDAVBackend
from storage.cloud_utils import validate_cloud_url
class NextcloudBackend(WebDAVBackend):
"""Nextcloud storage backend — extends WebDAVBackend with Nextcloud-specific path handling.
The server_url should be the full WebDAV root:
https://nc.example.com/remote.php/dav/files/{username}/
"""
def __init__(self, server_url: str, username: str, password: str) -> None:
super().__init__(server_url, username, password)
self._username = username
async def list_folder(self, folder_path: str = "") -> list[dict]:
"""List folder contents at folder_path relative to WebDAV root.
Returns a list of dicts: [{"id": str, "name": str, "is_dir": bool, "size": int}, ...]
Used by GET /api/cloud/folders/nextcloud/{folder_id} endpoint.
"""
validate_cloud_url(self._server_url)
# List the folder using client.list() which returns a list of file names
# For each item, call client.info() to get size and type
# Wrap each client call in asyncio.to_thread
# Return structured list
async def health_check(self) -> bool:
try:
validate_cloud_url(self._server_url)
# Use client.check("") or client.list("") to verify connectivity to root
result = await asyncio.to_thread(self._client.check, "")
return bool(result)
except Exception:
return False
NextcloudBackend inherits put_object, get_object, delete_object, presigned_get_url,
generate_presigned_put_url, and stat_object from WebDAVBackend.
The list_folder method is extra (not in ABC) and used exclusively by the cloud folder
listing API endpoint.
</action>
<verify>
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c "
from storage.nextcloud_backend import NextcloudBackend
from storage.webdav_backend import WebDAVBackend
import inspect
# Verify subclass
assert issubclass(NextcloudBackend, WebDAVBackend), 'NextcloudBackend must subclass WebDAVBackend'
# Verify all 7 methods async
for method in ['put_object','get_object','delete_object','presigned_get_url','health_check','generate_presigned_put_url','stat_object']:
assert inspect.iscoroutinefunction(getattr(NextcloudBackend, method)), f'{method} not async'
# Verify list_folder added
assert hasattr(NextcloudBackend, 'list_folder'), 'list_folder missing'
assert inspect.iscoroutinefunction(NextcloudBackend.list_folder), 'list_folder not async'
print('NextcloudBackend is WebDAVBackend subclass: OK')
print('All 7 StorageBackend methods async: OK')
print('list_folder method present and async: OK')
# SSRF guard inherited
try:
NextcloudBackend('http://10.0.0.1/dav', 'user', 'pass')
print('FAIL: SSRF should be blocked')
except ValueError:
print('SSRF guard inherited: OK')
"</automated>
</verify>
<acceptance_criteria>
- backend/storage/nextcloud_backend.py exists with class NextcloudBackend
- issubclass(NextcloudBackend, WebDAVBackend) is True
- All 7 StorageBackend methods are async (inherited or overridden)
- list_folder async method added beyond the ABC contract
- SSRF guard inherited from WebDAVBackend.__init__: NextcloudBackend("http://10.0.0.1/dav", ...) raises ValueError
- `pytest -v --tb=short` exits 0
</acceptance_criteria>
<done>NextcloudBackend created as WebDAVBackend subclass; list_folder added; SSRF guard inherited; pytest passes</done>
</task>
</tasks>
<threat_model>
## Trust Boundaries
| Boundary | Description |
|----------|-------------|
| user-supplied server_url → WebDAV client | Server URL must be validated for SSRF before Client construction and before each request |
| webdavclient3 sync calls → event loop | All sync SDK calls must be in asyncio.to_thread() to prevent event loop blocking |
| WebDAV credentials → encrypted storage | Credentials flow from encrypted DB via factory into backend constructor — never logged |
## STRIDE Threat Register
| Threat ID | Category | Component | Disposition | Mitigation Plan |
|-----------|----------|-----------|-------------|-----------------|
| T-05-04-01 | Tampering | WebDAVBackend — SSRF via server_url | mitigate | validate_cloud_url(server_url) in __init__ AND before every asyncio.to_thread call; D-17 requires both points |
| T-05-04-02 | Tampering | DNS rebinding on WebDAV requests | mitigate | validate_cloud_url called before each request (not only at connect-time); documented defense-in-depth via network egress firewall (RESEARCH.md Pitfall 5) |
| T-05-04-03 | Information Disclosure | WebDAV path includes user_id/document_id | accept | object_key = "docuvault/{user_id}/{document_id}{ext}" — no human filename; acceptable for single-user WebDAV servers |
| T-05-04-04 | Denial of Service | Nextcloud list_folder fetching info per item | accept | TTLCache (Plan 02) prevents repeated list_folder calls within 60s; per-item info call is provider overhead only |
| T-05-04-05 | Tampering | webdavclient3 path traversal via object_key | mitigate | put_object constructs object_key from user_id and document_id (both UUID values); get_object/delete_object receive object_key from DB (not from user input directly) — no raw user path injection |
</threat_model>
<verification>
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_cloud.py -v && python -m pytest -v --tb=short 2>&1 | tail -10
</verification>
<success_criteria>
- WebDAVBackend: all 7 methods async; validate_cloud_url in __init__ and before each request; presigned methods raise NotImplementedError
- NextcloudBackend: subclass of WebDAVBackend; list_folder method added; SSRF guard inherited
- pytest -v exits 0, 0 failures; test_cloud.py still all xfailed
</success_criteria>
<output>
Create `.planning/phases/05-cloud-storage-backends/05-04-SUMMARY.md` when done
</output>