docs(05): create phase 5 plan — cloud storage backends (8 plans, 7 waves)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,363 @@
|
||||
---
|
||||
phase: 05-cloud-storage-backends
|
||||
plan: 04
|
||||
type: execute
|
||||
wave: 3
|
||||
depends_on:
|
||||
- "05-02"
|
||||
files_modified:
|
||||
- backend/storage/nextcloud_backend.py
|
||||
- backend/storage/webdav_backend.py
|
||||
autonomous: true
|
||||
requirements:
|
||||
- CLOUD-01
|
||||
- CLOUD-07
|
||||
|
||||
must_haves:
|
||||
truths:
|
||||
- "NextcloudBackend implements all 7 StorageBackend abstract methods"
|
||||
- "WebDAVBackend implements all 7 StorageBackend abstract methods"
|
||||
- "validate_cloud_url() called inside WebDAVBackend and NextcloudBackend before every outbound WebDAV request"
|
||||
- "All sync webdavclient3 calls wrapped in asyncio.to_thread()"
|
||||
- "generate_presigned_put_url and presigned_get_url raise NotImplementedError on both WebDAV backends"
|
||||
- "health_check uses lightweight PROPFIND or check() call to validate connectivity without storing unverified credentials"
|
||||
artifacts:
|
||||
- path: "backend/storage/nextcloud_backend.py"
|
||||
provides: "Nextcloud WebDAV StorageBackend"
|
||||
contains: "class NextcloudBackend"
|
||||
- path: "backend/storage/webdav_backend.py"
|
||||
provides: "Generic WebDAV StorageBackend"
|
||||
contains: "class WebDAVBackend"
|
||||
key_links:
|
||||
- from: "backend/storage/nextcloud_backend.py"
|
||||
to: "backend/storage/cloud_utils.py"
|
||||
via: "validate_cloud_url called before every outbound request"
|
||||
pattern: "validate_cloud_url"
|
||||
- from: "backend/storage/webdav_backend.py"
|
||||
to: "backend/storage/cloud_utils.py"
|
||||
via: "validate_cloud_url called before every outbound request"
|
||||
pattern: "validate_cloud_url"
|
||||
---
|
||||
|
||||
<objective>
|
||||
Implement NextcloudBackend and WebDAVBackend — the two credential-based (non-OAuth) cloud StorageBackend concrete classes.
|
||||
|
||||
Purpose: These backends handle Nextcloud and generic WebDAV servers using HTTP Basic Auth. SSRF prevention via validate_cloud_url() is mandatory before every outbound request. All sync webdavclient3 calls are wrapped in asyncio.to_thread() per the MinIOBackend pattern.
|
||||
Output: nextcloud_backend.py and webdav_backend.py, each implementing all 7 StorageBackend methods.
|
||||
</objective>
|
||||
|
||||
<execution_context>
|
||||
@/Users/nik/.claude/get-shit-done/workflows/execute-plan.md
|
||||
@/Users/nik/.claude/get-shit-done/templates/summary.md
|
||||
</execution_context>
|
||||
|
||||
<context>
|
||||
@.planning/PROJECT.md
|
||||
@.planning/ROADMAP.md
|
||||
@.planning/phases/05-cloud-storage-backends/05-CONTEXT.md
|
||||
@.planning/phases/05-cloud-storage-backends/05-RESEARCH.md
|
||||
@.planning/phases/05-cloud-storage-backends/05-02-SUMMARY.md
|
||||
</context>
|
||||
|
||||
<interfaces>
|
||||
<!-- From backend/storage/base.py -->
|
||||
From backend/storage/base.py:
|
||||
class StorageBackend(ABC):
|
||||
async def put_object(user_id, document_id, file_bytes, extension, content_type) -> str
|
||||
async def get_object(object_key: str) -> bytes
|
||||
async def delete_object(object_key: str) -> None
|
||||
async def presigned_get_url(object_key: str, expires_minutes: int = 60) -> str
|
||||
async def health_check() -> bool
|
||||
async def generate_presigned_put_url(object_key: str, expires_minutes: int = 15) -> str
|
||||
async def stat_object(object_key: str) -> int
|
||||
|
||||
<!-- From RESEARCH.md Pattern 5 — webdavclient3 -->
|
||||
webdavclient3 Client options: {"webdav_hostname": server_url, "webdav_login": username, "webdav_password": password}
|
||||
All webdavclient3 calls are synchronous — MUST wrap in asyncio.to_thread()
|
||||
Method names to verify: client.upload_to(buf, remote_path), client.download_from(buf, remote_path)
|
||||
client.list(remote_dir), client.info(remote_path) returns dict with "size" key
|
||||
client.check(remote_path) returns bool — used for health_check
|
||||
client.clean(remote_path) — delete
|
||||
ASSUMPTION A1: verify upload_to/download_from method names against installed package during implementation
|
||||
|
||||
<!-- From RESEARCH.md Pattern 6 — SSRF prevention -->
|
||||
validate_cloud_url(url: str) -> None — raises ValueError if URL targets private/internal address
|
||||
Must be called: (1) at connect-time, (2) before every outbound WebDAV request
|
||||
|
||||
<!-- From RESEARCH.md Pitfall 2 — Nextcloud path encoding -->
|
||||
Use urllib.parse.quote() on path segments for Nextcloud compatibility with non-ASCII filenames
|
||||
|
||||
<!-- Object key scheme for WebDAV -->
|
||||
object_key = WebDAV path: "docuvault/{user_id}/{document_id}{extension}"
|
||||
CloudConnection credentials dict: {"server_url": str, "username": str, "password": str}
|
||||
</interfaces>
|
||||
|
||||
<tasks>
|
||||
|
||||
<task type="auto" tdd="true">
|
||||
<name>Task 1: Implement WebDAVBackend</name>
|
||||
<files>backend/storage/webdav_backend.py</files>
|
||||
<read_first>
|
||||
- backend/storage/base.py — all 7 method signatures
|
||||
- backend/storage/minio_backend.py — asyncio.to_thread() wrapping pattern
|
||||
- backend/storage/cloud_utils.py — validate_cloud_url signature
|
||||
- .planning/phases/05-cloud-storage-backends/05-RESEARCH.md — Pattern 5 (webdavclient3), Pitfall 2 (path encoding), A1 (assumed method names)
|
||||
</read_first>
|
||||
<behavior>
|
||||
- WebDAVBackend.__init__(self, server_url: str, username: str, password: str) creates webdavclient3 Client
|
||||
- validate_cloud_url(server_url) called in __init__ before constructing the client (SSRF guard at construct time)
|
||||
- put_object: constructs object_key = f"docuvault/{user_id}/{document_id}{extension}"; percent-encodes path segments; uploads via asyncio.to_thread; returns object_key
|
||||
- get_object: downloads to BytesIO via asyncio.to_thread; returns bytes
|
||||
- delete_object: deletes via asyncio.to_thread; catches FileNotFoundError / WebDavException for missing file (no-op)
|
||||
- presigned_get_url: raises NotImplementedError
|
||||
- generate_presigned_put_url: raises NotImplementedError
|
||||
- stat_object: calls asyncio.to_thread for client.info(object_key); returns int(info.get("size", 0))
|
||||
- health_check: calls asyncio.to_thread for client.check("/"); returns True/False
|
||||
- SSRF validation called before every asyncio.to_thread call: validate_cloud_url(self._server_url)
|
||||
- Uses urllib.parse.quote on non-docuvault path segments (Pitfall 2)
|
||||
</behavior>
|
||||
<action>
|
||||
Create backend/storage/webdav_backend.py with:
|
||||
|
||||
Module docstring explaining WebDAV backend, SSRF validation requirement per D-17, and Pitfall 2 (path encoding).
|
||||
|
||||
from __future__ import annotations
|
||||
import asyncio, io, urllib.parse
|
||||
from webdav3.client import Client
|
||||
from storage.base import StorageBackend
|
||||
from storage.cloud_utils import validate_cloud_url
|
||||
|
||||
class WebDAVBackend(StorageBackend):
|
||||
|
||||
def __init__(self, server_url: str, username: str, password: str) -> None:
|
||||
validate_cloud_url(server_url) # SSRF guard at construct time
|
||||
self._server_url = server_url
|
||||
options = {
|
||||
"webdav_hostname": server_url,
|
||||
"webdav_login": username,
|
||||
"webdav_password": password,
|
||||
}
|
||||
self._client = Client(options)
|
||||
|
||||
def _make_path(self, user_id: str, document_id: str, extension: str) -> str:
|
||||
# Construct path with percent-encoding for Nextcloud/WebDAV compatibility (Pitfall 2)
|
||||
encoded_uid = urllib.parse.quote(str(user_id), safe="")
|
||||
encoded_did = urllib.parse.quote(str(document_id), safe="")
|
||||
return f"docuvault/{encoded_uid}/{encoded_did}{extension}"
|
||||
|
||||
async def put_object(self, user_id, document_id, file_bytes, extension, content_type) -> str:
|
||||
validate_cloud_url(self._server_url) # re-validate before every request (D-17)
|
||||
object_key = self._make_path(user_id, document_id, extension)
|
||||
buf = io.BytesIO(file_bytes)
|
||||
# Ensure parent directory exists: client.mkdir("docuvault/{user_id}/") wrapped in asyncio.to_thread
|
||||
# Then: await asyncio.to_thread(self._client.upload_to, buf, object_key)
|
||||
# If upload_to method name incorrect, verify against webdavclient3 docs and use correct name
|
||||
return object_key
|
||||
|
||||
async def get_object(self, object_key: str) -> bytes:
|
||||
validate_cloud_url(self._server_url)
|
||||
buf = io.BytesIO()
|
||||
await asyncio.to_thread(self._client.download_from, buf, object_key)
|
||||
return buf.getvalue()
|
||||
|
||||
async def delete_object(self, object_key: str) -> None:
|
||||
validate_cloud_url(self._server_url)
|
||||
try:
|
||||
await asyncio.to_thread(self._client.clean, object_key)
|
||||
except Exception:
|
||||
pass # No-op if file not found
|
||||
|
||||
async def presigned_get_url(self, object_key: str, expires_minutes: int = 60) -> str:
|
||||
raise NotImplementedError("WebDAV backend does not support presigned URLs")
|
||||
|
||||
async def generate_presigned_put_url(self, object_key: str, expires_minutes: int = 15) -> str:
|
||||
raise NotImplementedError("WebDAV backend does not support presigned put URLs")
|
||||
|
||||
async def stat_object(self, object_key: str) -> int:
|
||||
validate_cloud_url(self._server_url)
|
||||
info = await asyncio.to_thread(self._client.info, object_key)
|
||||
return int(info.get("size", 0))
|
||||
|
||||
async def health_check(self) -> bool:
|
||||
try:
|
||||
validate_cloud_url(self._server_url)
|
||||
result = await asyncio.to_thread(self._client.check, "/")
|
||||
return bool(result)
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
IMPORTANT: During implementation, verify the webdavclient3 method names by running:
|
||||
python -c "from webdav3.client import Client; print([m for m in dir(Client) if not m.startswith('_')])"
|
||||
and use the correct method names. The RESEARCH.md marks upload_to/download_from as [ASSUMED].
|
||||
Correct method names if different (e.g., may be upload_sync, download_sync, or upload/download).
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c "
|
||||
from storage.webdav_backend import WebDAVBackend
|
||||
import inspect
|
||||
for method in ['put_object','get_object','delete_object','presigned_get_url','health_check','generate_presigned_put_url','stat_object']:
|
||||
assert inspect.iscoroutinefunction(getattr(WebDAVBackend, method)), f'{method} not async'
|
||||
# SSRF guard: connecting to localhost should raise ValueError
|
||||
try:
|
||||
WebDAVBackend('http://localhost/dav', 'user', 'pass')
|
||||
print('FAIL: should raise ValueError for localhost')
|
||||
except ValueError:
|
||||
print('OK: SSRF blocked in __init__')
|
||||
print('All methods async: OK')
|
||||
import asyncio
|
||||
backend = WebDAVBackend.__new__(WebDAVBackend)
|
||||
backend._server_url = 'https://example.com/dav' # bypass __init__ for method check
|
||||
async def check():
|
||||
try: await backend.presigned_get_url('k')
|
||||
except NotImplementedError: print('presigned_get_url NotImplementedError: OK')
|
||||
try: await backend.generate_presigned_put_url('k')
|
||||
except NotImplementedError: print('generate_presigned_put_url NotImplementedError: OK')
|
||||
asyncio.run(check())
|
||||
"</automated>
|
||||
</verify>
|
||||
<acceptance_criteria>
|
||||
- backend/storage/webdav_backend.py exists with class WebDAVBackend
|
||||
- All 7 methods are async coroutines
|
||||
- WebDAVBackend("http://127.0.0.1/dav", "u", "p") raises ValueError (SSRF guard in __init__)
|
||||
- presigned_get_url and generate_presigned_put_url raise NotImplementedError
|
||||
- validate_cloud_url imported and called in __init__ and before every asyncio.to_thread call
|
||||
- `pytest -v --tb=short` exits 0
|
||||
</acceptance_criteria>
|
||||
<done>WebDAVBackend created; SSRF validation in __init__ and before each request; all 7 methods async; pytest passes</done>
|
||||
</task>
|
||||
|
||||
<task type="auto" tdd="true">
|
||||
<name>Task 2: Implement NextcloudBackend</name>
|
||||
<files>backend/storage/nextcloud_backend.py</files>
|
||||
<read_first>
|
||||
- backend/storage/webdav_backend.py — WebDAVBackend implementation (NextcloudBackend extends it)
|
||||
- .planning/phases/05-cloud-storage-backends/05-RESEARCH.md — Open Question 2 (Nextcloud folder listing path convention), Pitfall 2 (path encoding)
|
||||
- backend/storage/cloud_utils.py — validate_cloud_url
|
||||
</read_first>
|
||||
<behavior>
|
||||
- NextcloudBackend subclasses WebDAVBackend — inherits all 7 methods; only overrides what differs
|
||||
- NextcloudBackend stores the username for folder listing path construction (Nextcloud WebDAV path: /remote.php/dav/files/{username}/)
|
||||
- SSRF validation inherited from WebDAVBackend parent class
|
||||
- list_folder(folder_path: str) -> list[dict] method added for cloud folder listing via PROPFIND (used by API)
|
||||
- list_folder returns list of dicts with keys: id (str path), name (str), is_dir (bool), size (int)
|
||||
- get_object and put_object inherited from WebDAVBackend
|
||||
- health_check overrides parent to use PROPFIND on the Nextcloud root path
|
||||
</behavior>
|
||||
<action>
|
||||
Create backend/storage/nextcloud_backend.py with:
|
||||
|
||||
Module docstring explaining Nextcloud extends WebDAVBackend; Nextcloud WebDAV base path convention.
|
||||
|
||||
from __future__ import annotations
|
||||
import asyncio, urllib.parse
|
||||
from storage.webdav_backend import WebDAVBackend
|
||||
from storage.cloud_utils import validate_cloud_url
|
||||
|
||||
class NextcloudBackend(WebDAVBackend):
|
||||
"""Nextcloud storage backend — extends WebDAVBackend with Nextcloud-specific path handling.
|
||||
|
||||
The server_url should be the full WebDAV root:
|
||||
https://nc.example.com/remote.php/dav/files/{username}/
|
||||
"""
|
||||
|
||||
def __init__(self, server_url: str, username: str, password: str) -> None:
|
||||
super().__init__(server_url, username, password)
|
||||
self._username = username
|
||||
|
||||
async def list_folder(self, folder_path: str = "") -> list[dict]:
|
||||
"""List folder contents at folder_path relative to WebDAV root.
|
||||
|
||||
Returns a list of dicts: [{"id": str, "name": str, "is_dir": bool, "size": int}, ...]
|
||||
Used by GET /api/cloud/folders/nextcloud/{folder_id} endpoint.
|
||||
"""
|
||||
validate_cloud_url(self._server_url)
|
||||
# List the folder using client.list() which returns a list of file names
|
||||
# For each item, call client.info() to get size and type
|
||||
# Wrap each client call in asyncio.to_thread
|
||||
# Return structured list
|
||||
|
||||
async def health_check(self) -> bool:
|
||||
try:
|
||||
validate_cloud_url(self._server_url)
|
||||
# Use client.check("") or client.list("") to verify connectivity to root
|
||||
result = await asyncio.to_thread(self._client.check, "")
|
||||
return bool(result)
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
NextcloudBackend inherits put_object, get_object, delete_object, presigned_get_url,
|
||||
generate_presigned_put_url, and stat_object from WebDAVBackend.
|
||||
|
||||
The list_folder method is extra (not in ABC) and used exclusively by the cloud folder
|
||||
listing API endpoint.
|
||||
</action>
|
||||
<verify>
|
||||
<automated>cd /Users/nik/Documents/Progamming/document_scanner/backend && python -c "
|
||||
from storage.nextcloud_backend import NextcloudBackend
|
||||
from storage.webdav_backend import WebDAVBackend
|
||||
import inspect
|
||||
# Verify subclass
|
||||
assert issubclass(NextcloudBackend, WebDAVBackend), 'NextcloudBackend must subclass WebDAVBackend'
|
||||
# Verify all 7 methods async
|
||||
for method in ['put_object','get_object','delete_object','presigned_get_url','health_check','generate_presigned_put_url','stat_object']:
|
||||
assert inspect.iscoroutinefunction(getattr(NextcloudBackend, method)), f'{method} not async'
|
||||
# Verify list_folder added
|
||||
assert hasattr(NextcloudBackend, 'list_folder'), 'list_folder missing'
|
||||
assert inspect.iscoroutinefunction(NextcloudBackend.list_folder), 'list_folder not async'
|
||||
print('NextcloudBackend is WebDAVBackend subclass: OK')
|
||||
print('All 7 StorageBackend methods async: OK')
|
||||
print('list_folder method present and async: OK')
|
||||
# SSRF guard inherited
|
||||
try:
|
||||
NextcloudBackend('http://10.0.0.1/dav', 'user', 'pass')
|
||||
print('FAIL: SSRF should be blocked')
|
||||
except ValueError:
|
||||
print('SSRF guard inherited: OK')
|
||||
"</automated>
|
||||
</verify>
|
||||
<acceptance_criteria>
|
||||
- backend/storage/nextcloud_backend.py exists with class NextcloudBackend
|
||||
- issubclass(NextcloudBackend, WebDAVBackend) is True
|
||||
- All 7 StorageBackend methods are async (inherited or overridden)
|
||||
- list_folder async method added beyond the ABC contract
|
||||
- SSRF guard inherited from WebDAVBackend.__init__: NextcloudBackend("http://10.0.0.1/dav", ...) raises ValueError
|
||||
- `pytest -v --tb=short` exits 0
|
||||
</acceptance_criteria>
|
||||
<done>NextcloudBackend created as WebDAVBackend subclass; list_folder added; SSRF guard inherited; pytest passes</done>
|
||||
</task>
|
||||
|
||||
</tasks>
|
||||
|
||||
<threat_model>
|
||||
## Trust Boundaries
|
||||
|
||||
| Boundary | Description |
|
||||
|----------|-------------|
|
||||
| user-supplied server_url → WebDAV client | Server URL must be validated for SSRF before Client construction and before each request |
|
||||
| webdavclient3 sync calls → event loop | All sync SDK calls must be in asyncio.to_thread() to prevent event loop blocking |
|
||||
| WebDAV credentials → encrypted storage | Credentials flow from encrypted DB via factory into backend constructor — never logged |
|
||||
|
||||
## STRIDE Threat Register
|
||||
|
||||
| Threat ID | Category | Component | Disposition | Mitigation Plan |
|
||||
|-----------|----------|-----------|-------------|-----------------|
|
||||
| T-05-04-01 | Tampering | WebDAVBackend — SSRF via server_url | mitigate | validate_cloud_url(server_url) in __init__ AND before every asyncio.to_thread call; D-17 requires both points |
|
||||
| T-05-04-02 | Tampering | DNS rebinding on WebDAV requests | mitigate | validate_cloud_url called before each request (not only at connect-time); documented defense-in-depth via network egress firewall (RESEARCH.md Pitfall 5) |
|
||||
| T-05-04-03 | Information Disclosure | WebDAV path includes user_id/document_id | accept | object_key = "docuvault/{user_id}/{document_id}{ext}" — no human filename; acceptable for single-user WebDAV servers |
|
||||
| T-05-04-04 | Denial of Service | Nextcloud list_folder fetching info per item | accept | TTLCache (Plan 02) prevents repeated list_folder calls within 60s; per-item info call is provider overhead only |
|
||||
| T-05-04-05 | Tampering | webdavclient3 path traversal via object_key | mitigate | put_object constructs object_key from user_id and document_id (both UUID values); get_object/delete_object receive object_key from DB (not from user input directly) — no raw user path injection |
|
||||
</threat_model>
|
||||
|
||||
<verification>
|
||||
cd /Users/nik/Documents/Progamming/document_scanner/backend && python -m pytest tests/test_cloud.py -v && python -m pytest -v --tb=short 2>&1 | tail -10
|
||||
</verification>
|
||||
|
||||
<success_criteria>
|
||||
- WebDAVBackend: all 7 methods async; validate_cloud_url in __init__ and before each request; presigned methods raise NotImplementedError
|
||||
- NextcloudBackend: subclass of WebDAVBackend; list_folder method added; SSRF guard inherited
|
||||
- pytest -v exits 0, 0 failures; test_cloud.py still all xfailed
|
||||
</success_criteria>
|
||||
|
||||
<output>
|
||||
Create `.planning/phases/05-cloud-storage-backends/05-04-SUMMARY.md` when done
|
||||
</output>
|
||||
Reference in New Issue
Block a user