- 05-02-SUMMARY.md: full plan summary with TDD gate compliance, deviation docs, threat surface scan - STATE.md: advanced to plan 26/32 (81%), updated session log, added 4 key decisions - ROADMAP.md: marked 05-02 complete (2/8 Phase 5 plans done)
9.3 KiB
phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, key-decisions, patterns-established, requirements-completed, duration, completed
| phase | plan | subsystem | tags | requires | provides | affects | tech-stack | key-files | key-decisions | patterns-established | requirements-completed | duration | completed | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 05-cloud-storage-backends | 02 | api |
|
|
|
|
|
|
|
|
|
18min | 2026-05-28 |
Phase 5 Plan 02: Shared Cloud Utilities Layer Summary
SSRF-safe URL validator (RFC-1918/loopback/link-local/localhost/IPv6 blocked via DNS resolution), HKDF-SHA256+Fernet credential encryption with per-user key derivation, TTLCache(1000, 60s) folder listing cache, and async storage backend factory for per-document backend dispatch
Performance
- Duration: 18 min
- Started: 2026-05-28T19:10:00Z
- Completed: 2026-05-28T19:28:00Z
- Tasks: 2
- Files modified: 4
Accomplishments
- Created
cloud_utils.pywithvalidate_cloud_url()(DNS-resolved SSRF prevention blocking RFC-1918, loopback, link-local, localhost, and IPv6 private ranges),_derive_fernet_key()(fresh HKDF instance per call to avoid AlreadyFinalized),encrypt_credentials()anddecrypt_credentials()(Fernet round-trip over JSON-serialised dict) - Created
cloud_cache.pywith module-levelTTLCache(maxsize=1000, ttl=60)singleton, thread-safe lock,get_cloud_folders_cached()async function (fetch-outside-lock pattern), andinvalidate_provider_cache()sync helper - Extended
storage/__init__.pywithget_storage_backend_for_document()async factory: returns MinIOBackend for minio docs, queries CloudConnection scoped to user.id, decrypts credentials, lazy-imports cloud backend classes to avoid circular imports; raises HTTPException(503) if connection missing or inactive - Created
tests/test_cloud_utils.pywith 27 green tests using TDD (RED → GREEN), covering all SSRF cases, HKDF round-trip invariants, TTLCache configuration, async cache behaviour, and factory importability
Task Commits
- RED phase tests -
7fdffdd(test) - Task 1: cloud_utils.py — SSRF validation and HKDF credential encryption -
976d2ca(feat) - Task 2: cloud_cache.py and storage factory extension -
fb80379(feat)
Files Created/Modified
/Users/nik/Documents/Progamming/document_scanner/backend/storage/cloud_utils.py— validate_cloud_url, _derive_fernet_key, encrypt_credentials, decrypt_credentials/Users/nik/Documents/Progamming/document_scanner/backend/services/cloud_cache.py— TTLCache singleton, get_cloud_folders_cached, invalidate_provider_cache/Users/nik/Documents/Progamming/document_scanner/backend/storage/__init__.py— Added get_storage_backend_for_document() async factory alongside existing get_storage_backend()/Users/nik/Documents/Progamming/document_scanner/backend/tests/test_cloud_utils.py— 27 green tests (TDD)
Decisions Made
- Explicit
hostname == "localhost"string block is added BEFORE DNS resolution. Python'sgetaddrinfo("localhost", None)behaviour varies by OS (macOS resolves to::1or127.0.0.1; Docker containers sometimes fail), so the string check is more reliable and O(1). test_allows_public_httpswas written to use8.8.8.8(a raw public IP) instead ofcloud.example.com. Thecloud.example.comdomain does not resolve in offline/sandbox CI environments, causing a spurious test failure unrelated to the SSRF logic being tested.# type: ignore[import]comments added to the lazy imports insideget_storage_backend_for_document()because the cloud backend modules (google_drive_backend.py,onedrive_backend.py,webdav_backend.py) do not exist yet — they are created by Plans 05-03 through 05-05.- IPv4/IPv6 family mismatch in
addr in netis caught viaexcept TypeError: continuerather than pre-filtering networks. This is simpler and avoids maintaining two separate network lists.
Deviations from Plan
Auto-fixed Issues
1. [Rule 1 - Bug] Fixed SSRF allow test using unresolvable domain
- Found during: Task 1 test execution (GREEN phase)
- Issue:
test_allows_public_httpsusedcloud.example.comwhich does not resolve in the local (offline) test environment, causing a spurious ValueError fromsocket.gaierror— not a real SSRF failure - Fix: Replaced with
https://8.8.8.8/remote.php/dav(raw public IP, no DNS required) - Files modified:
backend/tests/test_cloud_utils.py - Verification: Test now passes; implementation is correct and not changed
- Committed in:
976d2ca(part of Task 1 commit)
Total deviations: 1 auto-fixed (Rule 1 - test bug in network-isolated environment) Impact on plan: No scope creep. Fix was a test correctness issue, not an implementation change.
Issues Encountered
cryptography and cachetools were not installed in the local Python 3.9.6 environment (they were added to requirements.txt in Plan 05-01 but not installed locally). Installed both via pip3 install cryptography cachetools to enable local test execution. This is consistent with the Plan 05-01 SUMMARY note about running tests locally vs. inside Docker.
Known Stubs
None introduced by this plan. The # type: ignore[import] comments on the lazy cloud backend imports in storage/__init__.py are expected — those modules are created by Plans 05-03 through 05-05 and will be resolved as those plans complete.
Threat Surface Scan
No new network endpoints introduced. All security surfaces are internal utilities:
validate_cloud_url()is a pure validation function (no outbound calls)encrypt_credentials()/decrypt_credentials()are pure crypto functionsget_storage_backend_for_document()is a factory (no new HTTP endpoints)
No threat flags raised.
Next Phase Readiness
- All shared utilities are in place. Plans 05-03 through 05-05 can import from
storage.cloud_utilsandservices.cloud_cacheimmediately. get_storage_backend_for_document()will work for minio documents now; cloud backends are activated as each backend plan completes.- The 27 new tests in
test_cloud_utils.pyare green; the 19 xfail stubs intest_cloud.pyremain xfail (correctly — they test API endpoints not yet built). - Full suite: 199 passed / 43 xfailed / 1 pre-existing failure (
test_extract_docx— python-docx not installed locally, documented in Plan 05-01).
Self-Check: PASSED
Files verified present:
backend/storage/cloud_utils.py: FOUNDbackend/services/cloud_cache.py: FOUNDbackend/storage/__init__.py: FOUND (contains get_storage_backend_for_document)backend/tests/test_cloud_utils.py: FOUND (27 tests, all passing)
Commits verified:
7fdffdd: test(05-02): add failing RED tests — FOUND976d2ca: feat(05-02): implement cloud_utils.py — FOUNDfb80379: feat(05-02): implement cloud_cache.py and extend storage factory — FOUND
Phase: 05-cloud-storage-backends Completed: 2026-05-28