Files
kite/.planning/phases/05-cloud-storage-backends/05-02-SUMMARY.md
T
curo1305 3b84626da9 docs(05-02): complete shared cloud utilities plan
- 05-02-SUMMARY.md: full plan summary with TDD gate compliance, deviation docs, threat surface scan
- STATE.md: advanced to plan 26/32 (81%), updated session log, added 4 key decisions
- ROADMAP.md: marked 05-02 complete (2/8 Phase 5 plans done)
2026-05-28 21:04:03 +02:00

9.3 KiB

phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, key-decisions, patterns-established, requirements-completed, duration, completed
phase plan subsystem tags requires provides affects tech-stack key-files key-decisions patterns-established requirements-completed duration completed
05-cloud-storage-backends 02 api
cryptography
hkdf
fernet
ssrf
ipaddress
cachetools
ttlcache
cloud-storage
storage-factory
phase plan provides
05-cloud-storage-backends 01 Wave 0 xfail stubs in test_cloud.py, cloud Settings fields (cloud_creds_key), cachetools pin in requirements.txt
backend/storage/cloud_utils.py: validate_cloud_url (SSRF), encrypt_credentials, decrypt_credentials, _derive_fernet_key (HKDF)
backend/services/cloud_cache.py: TTLCache(maxsize=1000,ttl=60) singleton, get_cloud_folders_cached (async), invalidate_provider_cache
backend/storage/__init__.py: get_storage_backend_for_document() async factory
backend/tests/test_cloud_utils.py: 27 green tests covering SSRF, HKDF round-trip, cache, factory
05-03
05-04
05-05
05-06
05-07
05-08
added patterns
cryptography (HKDF-SHA256, Fernet AES-256-GCM) — installed locally for testing
cachetools (TTLCache) — installed locally for testing
Fresh HKDF instance per call (AlreadyFinalized pitfall avoidance — RESEARCH.md Pitfall 3)
DNS-resolved SSRF check: socket.getaddrinfo before ipaddress.ip_network membership test
Explicit localhost string block before DNS resolution (OS-agnostic edge case)
Fetch-outside-lock async cache pattern: acquire lock to check, release, await fetch_fn, acquire lock to write
Lazy import inside get_storage_backend_for_document to avoid circular imports at module load time
created modified
backend/storage/cloud_utils.py
backend/services/cloud_cache.py
backend/tests/test_cloud_utils.py
backend/storage/__init__.py
Explicit 'localhost' string block added before DNS resolution — Python 3.9 getaddrinfo resolves localhost to 127.0.0.1 on macOS but behaviour varies by OS; string check is O(1) and OS-agnostic
validate_cloud_url test using 8.8.8.8 (raw public IP) instead of cloud.example.com — example.com does not resolve in offline CI environments
type: ignore[import] on lazy cloud backend imports — modules do not exist yet (Plans 05-03..05-05 create them)
IPv4/IPv6 family mismatch in ip_network check handled via try/except TypeError to avoid cross-family errors
HKDF key derivation: fresh HKDF(...) instance inside _derive_fernet_key() every call, never cached
SSRF validation: scheme check → hostname presence → localhost string → raw IP parse → DNS resolve → blocked network membership
Cloud factory extension: get_storage_backend_for_document() alongside (not replacing) get_storage_backend()
TTLCache thread safety: threading.Lock wraps all _folder_cache reads/writes; fetch_fn awaited outside lock
CLOUD-02
CLOUD-07
18min 2026-05-28

Phase 5 Plan 02: Shared Cloud Utilities Layer Summary

SSRF-safe URL validator (RFC-1918/loopback/link-local/localhost/IPv6 blocked via DNS resolution), HKDF-SHA256+Fernet credential encryption with per-user key derivation, TTLCache(1000, 60s) folder listing cache, and async storage backend factory for per-document backend dispatch

Performance

  • Duration: 18 min
  • Started: 2026-05-28T19:10:00Z
  • Completed: 2026-05-28T19:28:00Z
  • Tasks: 2
  • Files modified: 4

Accomplishments

  • Created cloud_utils.py with validate_cloud_url() (DNS-resolved SSRF prevention blocking RFC-1918, loopback, link-local, localhost, and IPv6 private ranges), _derive_fernet_key() (fresh HKDF instance per call to avoid AlreadyFinalized), encrypt_credentials() and decrypt_credentials() (Fernet round-trip over JSON-serialised dict)
  • Created cloud_cache.py with module-level TTLCache(maxsize=1000, ttl=60) singleton, thread-safe lock, get_cloud_folders_cached() async function (fetch-outside-lock pattern), and invalidate_provider_cache() sync helper
  • Extended storage/__init__.py with get_storage_backend_for_document() async factory: returns MinIOBackend for minio docs, queries CloudConnection scoped to user.id, decrypts credentials, lazy-imports cloud backend classes to avoid circular imports; raises HTTPException(503) if connection missing or inactive
  • Created tests/test_cloud_utils.py with 27 green tests using TDD (RED → GREEN), covering all SSRF cases, HKDF round-trip invariants, TTLCache configuration, async cache behaviour, and factory importability

Task Commits

  1. RED phase tests - 7fdffdd (test)
  2. Task 1: cloud_utils.py — SSRF validation and HKDF credential encryption - 976d2ca (feat)
  3. Task 2: cloud_cache.py and storage factory extension - fb80379 (feat)

Files Created/Modified

  • /Users/nik/Documents/Progamming/document_scanner/backend/storage/cloud_utils.py — validate_cloud_url, _derive_fernet_key, encrypt_credentials, decrypt_credentials
  • /Users/nik/Documents/Progamming/document_scanner/backend/services/cloud_cache.py — TTLCache singleton, get_cloud_folders_cached, invalidate_provider_cache
  • /Users/nik/Documents/Progamming/document_scanner/backend/storage/__init__.py — Added get_storage_backend_for_document() async factory alongside existing get_storage_backend()
  • /Users/nik/Documents/Progamming/document_scanner/backend/tests/test_cloud_utils.py — 27 green tests (TDD)

Decisions Made

  • Explicit hostname == "localhost" string block is added BEFORE DNS resolution. Python's getaddrinfo("localhost", None) behaviour varies by OS (macOS resolves to ::1 or 127.0.0.1; Docker containers sometimes fail), so the string check is more reliable and O(1).
  • test_allows_public_https was written to use 8.8.8.8 (a raw public IP) instead of cloud.example.com. The cloud.example.com domain does not resolve in offline/sandbox CI environments, causing a spurious test failure unrelated to the SSRF logic being tested.
  • # type: ignore[import] comments added to the lazy imports inside get_storage_backend_for_document() because the cloud backend modules (google_drive_backend.py, onedrive_backend.py, webdav_backend.py) do not exist yet — they are created by Plans 05-03 through 05-05.
  • IPv4/IPv6 family mismatch in addr in net is caught via except TypeError: continue rather than pre-filtering networks. This is simpler and avoids maintaining two separate network lists.

Deviations from Plan

Auto-fixed Issues

1. [Rule 1 - Bug] Fixed SSRF allow test using unresolvable domain

  • Found during: Task 1 test execution (GREEN phase)
  • Issue: test_allows_public_https used cloud.example.com which does not resolve in the local (offline) test environment, causing a spurious ValueError from socket.gaierror — not a real SSRF failure
  • Fix: Replaced with https://8.8.8.8/remote.php/dav (raw public IP, no DNS required)
  • Files modified: backend/tests/test_cloud_utils.py
  • Verification: Test now passes; implementation is correct and not changed
  • Committed in: 976d2ca (part of Task 1 commit)

Total deviations: 1 auto-fixed (Rule 1 - test bug in network-isolated environment) Impact on plan: No scope creep. Fix was a test correctness issue, not an implementation change.

Issues Encountered

cryptography and cachetools were not installed in the local Python 3.9.6 environment (they were added to requirements.txt in Plan 05-01 but not installed locally). Installed both via pip3 install cryptography cachetools to enable local test execution. This is consistent with the Plan 05-01 SUMMARY note about running tests locally vs. inside Docker.

Known Stubs

None introduced by this plan. The # type: ignore[import] comments on the lazy cloud backend imports in storage/__init__.py are expected — those modules are created by Plans 05-03 through 05-05 and will be resolved as those plans complete.

Threat Surface Scan

No new network endpoints introduced. All security surfaces are internal utilities:

  • validate_cloud_url() is a pure validation function (no outbound calls)
  • encrypt_credentials() / decrypt_credentials() are pure crypto functions
  • get_storage_backend_for_document() is a factory (no new HTTP endpoints)

No threat flags raised.

Next Phase Readiness

  • All shared utilities are in place. Plans 05-03 through 05-05 can import from storage.cloud_utils and services.cloud_cache immediately.
  • get_storage_backend_for_document() will work for minio documents now; cloud backends are activated as each backend plan completes.
  • The 27 new tests in test_cloud_utils.py are green; the 19 xfail stubs in test_cloud.py remain xfail (correctly — they test API endpoints not yet built).
  • Full suite: 199 passed / 43 xfailed / 1 pre-existing failure (test_extract_docx — python-docx not installed locally, documented in Plan 05-01).

Self-Check: PASSED

Files verified present:

  • backend/storage/cloud_utils.py: FOUND
  • backend/services/cloud_cache.py: FOUND
  • backend/storage/__init__.py: FOUND (contains get_storage_backend_for_document)
  • backend/tests/test_cloud_utils.py: FOUND (27 tests, all passing)

Commits verified:

  • 7fdffdd: test(05-02): add failing RED tests — FOUND
  • 976d2ca: feat(05-02): implement cloud_utils.py — FOUND
  • fb80379: feat(05-02): implement cloud_cache.py and extend storage factory — FOUND

Phase: 05-cloud-storage-backends Completed: 2026-05-28