baa5bed7e2
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
364 lines
18 KiB
Markdown
364 lines
18 KiB
Markdown
---
|
|
phase: 05-cloud-storage-backends
|
|
plan: 04
|
|
type: execute
|
|
wave: 3
|
|
depends_on:
|
|
- "05-02"
|
|
files_modified:
|
|
- backend/storage/nextcloud_backend.py
|
|
- backend/storage/webdav_backend.py
|
|
autonomous: true
|
|
requirements:
|
|
- CLOUD-01
|
|
- CLOUD-07
|
|
|
|
must_haves:
|
|
truths:
|
|
- "NextcloudBackend implements all 7 StorageBackend abstract methods"
|
|
- "WebDAVBackend implements all 7 StorageBackend abstract methods"
|
|
- "validate_cloud_url() called inside WebDAVBackend and NextcloudBackend before every outbound WebDAV request"
|
|
- "All sync webdavclient3 calls wrapped in asyncio.to_thread()"
|
|
- "generate_presigned_put_url and presigned_get_url raise NotImplementedError on both WebDAV backends"
|
|
- "health_check uses lightweight PROPFIND or check() call to validate connectivity without storing unverified credentials"
|
|
artifacts:
|
|
- path: "backend/storage/nextcloud_backend.py"
|
|
provides: "Nextcloud WebDAV StorageBackend"
|
|
contains: "class NextcloudBackend"
|
|
- path: "backend/storage/webdav_backend.py"
|
|
provides: "Generic WebDAV StorageBackend"
|
|
contains: "class WebDAVBackend"
|
|
key_links:
|
|
- from: "backend/storage/nextcloud_backend.py"
|
|
to: "backend/storage/cloud_utils.py"
|
|
via: "validate_cloud_url called before every outbound request"
|
|
pattern: "validate_cloud_url"
|
|
- from: "backend/storage/webdav_backend.py"
|
|
to: "backend/storage/cloud_utils.py"
|
|
via: "validate_cloud_url called before every outbound request"
|
|
pattern: "validate_cloud_url"
|
|
---
|
|
|
|
<objective>
|
|
Implement NextcloudBackend and WebDAVBackend — the two credential-based (non-OAuth) cloud StorageBackend concrete classes.
|
|
|
|
Purpose: These backends handle Nextcloud and generic WebDAV servers using HTTP Basic Auth. SSRF prevention via validate_cloud_url() is mandatory before every outbound request. All sync webdavclient3 calls are wrapped in asyncio.to_thread() per the MinIOBackend pattern.
|
|
Output: nextcloud_backend.py and webdav_backend.py, each implementing all 7 StorageBackend methods.
|
|
</objective>
|
|
|
|
<execution_context>
|
|
@/Users/nik/.claude/get-shit-done/workflows/execute-plan.md
|
|
@/Users/nik/.claude/get-shit-done/templates/summary.md
|
|
</execution_context>
|
|
|
|
<context>
|
|
@.planning/PROJECT.md
|
|
@.planning/ROADMAP.md
|
|
@.planning/phases/05-cloud-storage-backends/05-CONTEXT.md
|
|
@.planning/phases/05-cloud-storage-backends/05-RESEARCH.md
|
|
@.planning/phases/05-cloud-storage-backends/05-02-SUMMARY.md
|
|
</context>
|
|
|
|
<interfaces>
|
|
<!-- From backend/storage/base.py -->
|
|
From backend/storage/base.py:
|
|
class StorageBackend(ABC):
|
|
async def put_object(user_id, document_id, file_bytes, extension, content_type) -> str
|
|
async def get_object(object_key: str) -> bytes
|
|
async def delete_object(object_key: str) -> None
|
|
async def presigned_get_url(object_key: str, expires_minutes: int = 60) -> str
|
|
async def health_check() -> bool
|
|
async def generate_presigned_put_url(object_key: str, expires_minutes: int = 15) -> str
|
|
async def stat_object(object_key: str) -> int
|
|
|
|
<!-- From RESEARCH.md Pattern 5 — webdavclient3 -->
|
|
webdavclient3 Client options: {"webdav_hostname": server_url, "webdav_login": username, "webdav_password": password}
|
|
All webdavclient3 calls are synchronous — MUST wrap in asyncio.to_thread()
|
|
Method names to verify: client.upload_to(buf, remote_path), client.download_from(buf, remote_path)
|
|
client.list(remote_dir), client.info(remote_path) returns dict with "size" key
|
|
client.check(remote_path) returns bool — used for health_check
|
|
client.clean(remote_path) — delete
|
|
ASSUMPTION A1: verify upload_to/download_from method names against installed package during implementation
|
|
|
|
<!-- From RESEARCH.md Pattern 6 — SSRF prevention -->
|
|
validate_cloud_url(url: str) -> None — raises ValueError if URL targets private/internal address
|
|
Must be called: (1) at connect-time, (2) before every outbound WebDAV request
|
|
|
|
<!-- From RESEARCH.md Pitfall 2 — Nextcloud path encoding -->
|
|
Use urllib.parse.quote() on path segments for Nextcloud compatibility with non-ASCII filenames
|
|
|
|
<!-- Object key scheme for WebDAV -->
|
|
object_key = WebDAV path: "docuvault/{user_id}/{document_id}{extension}"
|
|
CloudConnection credentials dict: {"server_url": str, "username": str, "password": str}
|
|
</interfaces>
|
|
|
|
<tasks>
|
|
|
|
<task type="auto" tdd="true">
|
|
<name>Task 1: Implement WebDAVBackend</name>
|
|
<files>backend/storage/webdav_backend.py</files>
|
|
<read_first>
|
|
- backend/storage/base.py — all 7 method signatures
|
|
- backend/storage/minio_backend.py — asyncio.to_thread() wrapping pattern
|
|
- backend/storage/cloud_utils.py — validate_cloud_url signature
|
|
- .planning/phases/05-cloud-storage-backends/05-RESEARCH.md — Pattern 5 (webdavclient3), Pitfall 2 (path encoding), A1 (assumed method names)
|
|
</read_first>
|
|
<behavior>
|
|
- WebDAVBackend.__init__(self, server_url: str, username: str, password: str) creates webdavclient3 Client
|
|
- validate_cloud_url(server_url) called in __init__ before constructing the client (SSRF guard at construct time)
|
|
- put_object: constructs object_key = f"docuvault/{user_id}/{document_id}{extension}"; percent-encodes path segments; uploads via asyncio.to_thread; returns object_key
|
|
- get_object: downloads to BytesIO via asyncio.to_thread; returns bytes
|
|
- delete_object: deletes via asyncio.to_thread; catches FileNotFoundError / WebDavException for missing file (no-op)
|
|
- presigned_get_url: raises NotImplementedError
|
|
- generate_presigned_put_url: raises NotImplementedError
|
|
- stat_object: calls asyncio.to_thread for client.info(object_key); returns int(info.get("size", 0))
|
|
- health_check: calls asyncio.to_thread for client.check("/"); returns True/False
|
|
- SSRF validation called before every asyncio.to_thread call: validate_cloud_url(self._server_url)
|
|
- Uses urllib.parse.quote on non-docuvault path segments (Pitfall 2)
|
|
</behavior>
|
|
<action>
|
|
Create backend/storage/webdav_backend.py with:
|
|
|
|
Module docstring explaining WebDAV backend, SSRF validation requirement per D-17, and Pitfall 2 (path encoding).
|
|
|
|
from __future__ import annotations
|
|
import asyncio, io, urllib.parse
|
|
from webdav3.client import Client
|
|
from storage.base import StorageBackend
|
|
from storage.cloud_utils import validate_cloud_url
|
|
|
|
class WebDAVBackend(StorageBackend):
|
|
|
|
def __init__(self, server_url: str, username: str, password: str) -> None:
|
|
validate_cloud_url(server_url) # SSRF guard at construct time
|
|
self._server_url = server_url
|
|
options = {
|
|
"webdav_hostname": server_url,
|
|
"webdav_login": username,
|
|
"webdav_password": password,
|
|
}
|
|
self._client = Client(options)
|
|
|
|
def _make_path(self, user_id: str, document_id: str, extension: str) -> str:
|
|
# Construct path with percent-encoding for Nextcloud/WebDAV compatibility (Pitfall 2)
|
|
encoded_uid = urllib.parse.quote(str(user_id), safe="")
|
|
encoded_did = urllib.parse.quote(str(document_id), safe="")
|
|
return f"docuvault/{encoded_uid}/{encoded_did}{extension}"
|
|
|
|
async def put_object(self, user_id, document_id, file_bytes, extension, content_type) -> str:
|
|
validate_cloud_url(self._server_url) # re-validate before every request (D-17)
|
|
object_key = self._make_path(user_id, document_id, extension)
|
|
buf = io.BytesIO(file_bytes)
|
|
# Ensure parent directory exists: client.mkdir("docuvault/{user_id}/") wrapped in asyncio.to_thread
|
|
# Then: await asyncio.to_thread(self._client.upload_to, buf, object_key)
|
|
# If upload_to method name incorrect, verify against webdavclient3 docs and use correct name
|
|
return object_key
|
|
|
|
async def get_object(self, object_key: str) -> bytes:
|
|
validate_cloud_url(self._server_url)
|
|
buf = io.BytesIO()
|
|
await asyncio.to_thread(self._client.download_from, buf, object_key)
|
|
return buf.getvalue()
|
|
|
|
async def delete_object(self, object_key: str) -> None:
|
|
validate_cloud_url(self._server_url)
|
|
try:
|
|
await asyncio.to_thread(self._client.clean, object_key)
|
|
except Exception:
|
|
pass # No-op if file not found
|
|
|
|
async def presigned_get_url(self, object_key: str, expires_minutes: int = 60) -> str:
|
|
raise NotImplementedError("WebDAV backend does not support presigned URLs")
|
|
|
|
async def generate_presigned_put_url(self, object_key: str, expires_minutes: int = 15) -> str:
|
|
raise NotImplementedError("WebDAV backend does not support presigned put URLs")
|
|
|
|
async def stat_object(self, object_key: str) -> int:
|
|
validate_cloud_url(self._server_url)
|
|
info = await asyncio.to_thread(self._client.info, object_key)
|
|
return int(info.get("size", 0))
|
|
|
|
async def health_check(self) -> bool:
|
|
try:
|
|
validate_cloud_url(self._server_url)
|
|
result = await asyncio.to_thread(self._client.check, "/")
|
|
return bool(result)
|
|
except Exception:
|
|
return False
|
|
|
|
IMPORTANT: During implementation, verify the webdavclient3 method names by running:
|
|
python -c "from webdav3.client import Client; print([m for m in dir(Client) if not m.startswith('_')])"
|
|
and use the correct method names. The RESEARCH.md marks upload_to/download_from as [ASSUMED].
|
|
Correct method names if different (e.g., may be upload_sync, download_sync, or upload/download).
|
|
</action>
|
|
<verify>
|
|
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c "
|
|
from storage.webdav_backend import WebDAVBackend
|
|
import inspect
|
|
for method in ['put_object','get_object','delete_object','presigned_get_url','health_check','generate_presigned_put_url','stat_object']:
|
|
assert inspect.iscoroutinefunction(getattr(WebDAVBackend, method)), f'{method} not async'
|
|
# SSRF guard: connecting to localhost should raise ValueError
|
|
try:
|
|
WebDAVBackend('http://localhost/dav', 'user', 'pass')
|
|
print('FAIL: should raise ValueError for localhost')
|
|
except ValueError:
|
|
print('OK: SSRF blocked in __init__')
|
|
print('All methods async: OK')
|
|
import asyncio
|
|
backend = WebDAVBackend.__new__(WebDAVBackend)
|
|
backend._server_url = 'https://example.com/dav' # bypass __init__ for method check
|
|
async def check():
|
|
try: await backend.presigned_get_url('k')
|
|
except NotImplementedError: print('presigned_get_url NotImplementedError: OK')
|
|
try: await backend.generate_presigned_put_url('k')
|
|
except NotImplementedError: print('generate_presigned_put_url NotImplementedError: OK')
|
|
asyncio.run(check())
|
|
"</automated>
|
|
</verify>
|
|
<acceptance_criteria>
|
|
- backend/storage/webdav_backend.py exists with class WebDAVBackend
|
|
- All 7 methods are async coroutines
|
|
- WebDAVBackend("http://127.0.0.1/dav", "u", "p") raises ValueError (SSRF guard in __init__)
|
|
- presigned_get_url and generate_presigned_put_url raise NotImplementedError
|
|
- validate_cloud_url imported and called in __init__ and before every asyncio.to_thread call
|
|
- `pytest -v --tb=short` exits 0
|
|
</acceptance_criteria>
|
|
<done>WebDAVBackend created; SSRF validation in __init__ and before each request; all 7 methods async; pytest passes</done>
|
|
</task>
|
|
|
|
<task type="auto" tdd="true">
|
|
<name>Task 2: Implement NextcloudBackend</name>
|
|
<files>backend/storage/nextcloud_backend.py</files>
|
|
<read_first>
|
|
- backend/storage/webdav_backend.py — WebDAVBackend implementation (NextcloudBackend extends it)
|
|
- .planning/phases/05-cloud-storage-backends/05-RESEARCH.md — Open Question 2 (Nextcloud folder listing path convention), Pitfall 2 (path encoding)
|
|
- backend/storage/cloud_utils.py — validate_cloud_url
|
|
</read_first>
|
|
<behavior>
|
|
- NextcloudBackend subclasses WebDAVBackend — inherits all 7 methods; only overrides what differs
|
|
- NextcloudBackend stores the username for folder listing path construction (Nextcloud WebDAV path: /remote.php/dav/files/{username}/)
|
|
- SSRF validation inherited from WebDAVBackend parent class
|
|
- list_folder(folder_path: str) -> list[dict] method added for cloud folder listing via PROPFIND (used by API)
|
|
- list_folder returns list of dicts with keys: id (str path), name (str), is_dir (bool), size (int)
|
|
- get_object and put_object inherited from WebDAVBackend
|
|
- health_check overrides parent to use PROPFIND on the Nextcloud root path
|
|
</behavior>
|
|
<action>
|
|
Create backend/storage/nextcloud_backend.py with:
|
|
|
|
Module docstring explaining Nextcloud extends WebDAVBackend; Nextcloud WebDAV base path convention.
|
|
|
|
from __future__ import annotations
|
|
import asyncio, urllib.parse
|
|
from storage.webdav_backend import WebDAVBackend
|
|
from storage.cloud_utils import validate_cloud_url
|
|
|
|
class NextcloudBackend(WebDAVBackend):
|
|
"""Nextcloud storage backend — extends WebDAVBackend with Nextcloud-specific path handling.
|
|
|
|
The server_url should be the full WebDAV root:
|
|
https://nc.example.com/remote.php/dav/files/{username}/
|
|
"""
|
|
|
|
def __init__(self, server_url: str, username: str, password: str) -> None:
|
|
super().__init__(server_url, username, password)
|
|
self._username = username
|
|
|
|
async def list_folder(self, folder_path: str = "") -> list[dict]:
|
|
"""List folder contents at folder_path relative to WebDAV root.
|
|
|
|
Returns a list of dicts: [{"id": str, "name": str, "is_dir": bool, "size": int}, ...]
|
|
Used by GET /api/cloud/folders/nextcloud/{folder_id} endpoint.
|
|
"""
|
|
validate_cloud_url(self._server_url)
|
|
# List the folder using client.list() which returns a list of file names
|
|
# For each item, call client.info() to get size and type
|
|
# Wrap each client call in asyncio.to_thread
|
|
# Return structured list
|
|
|
|
async def health_check(self) -> bool:
|
|
try:
|
|
validate_cloud_url(self._server_url)
|
|
# Use client.check("") or client.list("") to verify connectivity to root
|
|
result = await asyncio.to_thread(self._client.check, "")
|
|
return bool(result)
|
|
except Exception:
|
|
return False
|
|
|
|
NextcloudBackend inherits put_object, get_object, delete_object, presigned_get_url,
|
|
generate_presigned_put_url, and stat_object from WebDAVBackend.
|
|
|
|
The list_folder method is extra (not in ABC) and used exclusively by the cloud folder
|
|
listing API endpoint.
|
|
</action>
|
|
<verify>
|
|
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c "
|
|
from storage.nextcloud_backend import NextcloudBackend
|
|
from storage.webdav_backend import WebDAVBackend
|
|
import inspect
|
|
# Verify subclass
|
|
assert issubclass(NextcloudBackend, WebDAVBackend), 'NextcloudBackend must subclass WebDAVBackend'
|
|
# Verify all 7 methods async
|
|
for method in ['put_object','get_object','delete_object','presigned_get_url','health_check','generate_presigned_put_url','stat_object']:
|
|
assert inspect.iscoroutinefunction(getattr(NextcloudBackend, method)), f'{method} not async'
|
|
# Verify list_folder added
|
|
assert hasattr(NextcloudBackend, 'list_folder'), 'list_folder missing'
|
|
assert inspect.iscoroutinefunction(NextcloudBackend.list_folder), 'list_folder not async'
|
|
print('NextcloudBackend is WebDAVBackend subclass: OK')
|
|
print('All 7 StorageBackend methods async: OK')
|
|
print('list_folder method present and async: OK')
|
|
# SSRF guard inherited
|
|
try:
|
|
NextcloudBackend('http://10.0.0.1/dav', 'user', 'pass')
|
|
print('FAIL: SSRF should be blocked')
|
|
except ValueError:
|
|
print('SSRF guard inherited: OK')
|
|
"</automated>
|
|
</verify>
|
|
<acceptance_criteria>
|
|
- backend/storage/nextcloud_backend.py exists with class NextcloudBackend
|
|
- issubclass(NextcloudBackend, WebDAVBackend) is True
|
|
- All 7 StorageBackend methods are async (inherited or overridden)
|
|
- list_folder async method added beyond the ABC contract
|
|
- SSRF guard inherited from WebDAVBackend.__init__: NextcloudBackend("http://10.0.0.1/dav", ...) raises ValueError
|
|
- `pytest -v --tb=short` exits 0
|
|
</acceptance_criteria>
|
|
<done>NextcloudBackend created as WebDAVBackend subclass; list_folder added; SSRF guard inherited; pytest passes</done>
|
|
</task>
|
|
|
|
</tasks>
|
|
|
|
<threat_model>
|
|
## Trust Boundaries
|
|
|
|
| Boundary | Description |
|
|
|----------|-------------|
|
|
| user-supplied server_url → WebDAV client | Server URL must be validated for SSRF before Client construction and before each request |
|
|
| webdavclient3 sync calls → event loop | All sync SDK calls must be in asyncio.to_thread() to prevent event loop blocking |
|
|
| WebDAV credentials → encrypted storage | Credentials flow from encrypted DB via factory into backend constructor — never logged |
|
|
|
|
## STRIDE Threat Register
|
|
|
|
| Threat ID | Category | Component | Disposition | Mitigation Plan |
|
|
|-----------|----------|-----------|-------------|-----------------|
|
|
| T-05-04-01 | Tampering | WebDAVBackend — SSRF via server_url | mitigate | validate_cloud_url(server_url) in __init__ AND before every asyncio.to_thread call; D-17 requires both points |
|
|
| T-05-04-02 | Tampering | DNS rebinding on WebDAV requests | mitigate | validate_cloud_url called before each request (not only at connect-time); documented defense-in-depth via network egress firewall (RESEARCH.md Pitfall 5) |
|
|
| T-05-04-03 | Information Disclosure | WebDAV path includes user_id/document_id | accept | object_key = "docuvault/{user_id}/{document_id}{ext}" — no human filename; acceptable for single-user WebDAV servers |
|
|
| T-05-04-04 | Denial of Service | Nextcloud list_folder fetching info per item | accept | TTLCache (Plan 02) prevents repeated list_folder calls within 60s; per-item info call is provider overhead only |
|
|
| T-05-04-05 | Tampering | webdavclient3 path traversal via object_key | mitigate | put_object constructs object_key from user_id and document_id (both UUID values); get_object/delete_object receive object_key from DB (not from user input directly) — no raw user path injection |
|
|
</threat_model>
|
|
|
|
<verification>
|
|
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_cloud.py -v && python -m pytest -v --tb=short 2>&1 | tail -10
|
|
</verification>
|
|
|
|
<success_criteria>
|
|
- WebDAVBackend: all 7 methods async; validate_cloud_url in __init__ and before each request; presigned methods raise NotImplementedError
|
|
- NextcloudBackend: subclass of WebDAVBackend; list_folder method added; SSRF guard inherited
|
|
- pytest -v exits 0, 0 failures; test_cloud.py still all xfailed
|
|
</success_criteria>
|
|
|
|
<output>
|
|
Create `.planning/phases/05-cloud-storage-backends/05-04-SUMMARY.md` when done
|
|
</output>
|