docs(05-02): complete shared cloud utilities plan
- 05-02-SUMMARY.md: full plan summary with TDD gate compliance, deviation docs, threat surface scan - STATE.md: advanced to plan 26/32 (81%), updated session log, added 4 key decisions - ROADMAP.md: marked 05-02 complete (2/8 Phase 5 plans done)
This commit is contained in:
@@ -0,0 +1,155 @@
|
||||
---
|
||||
phase: 05-cloud-storage-backends
|
||||
plan: 02
|
||||
subsystem: api
|
||||
tags: [cryptography, hkdf, fernet, ssrf, ipaddress, cachetools, ttlcache, cloud-storage, storage-factory]
|
||||
|
||||
# Dependency graph
|
||||
requires:
|
||||
- phase: 05-cloud-storage-backends
|
||||
plan: 01
|
||||
provides: "Wave 0 xfail stubs in test_cloud.py, cloud Settings fields (cloud_creds_key), cachetools pin in requirements.txt"
|
||||
provides:
|
||||
- "backend/storage/cloud_utils.py: validate_cloud_url (SSRF), encrypt_credentials, decrypt_credentials, _derive_fernet_key (HKDF)"
|
||||
- "backend/services/cloud_cache.py: TTLCache(maxsize=1000,ttl=60) singleton, get_cloud_folders_cached (async), invalidate_provider_cache"
|
||||
- "backend/storage/__init__.py: get_storage_backend_for_document() async factory"
|
||||
- "backend/tests/test_cloud_utils.py: 27 green tests covering SSRF, HKDF round-trip, cache, factory"
|
||||
affects: [05-03, 05-04, 05-05, 05-06, 05-07, 05-08]
|
||||
|
||||
# Tech tracking
|
||||
tech-stack:
|
||||
added:
|
||||
- cryptography (HKDF-SHA256, Fernet AES-256-GCM) — installed locally for testing
|
||||
- cachetools (TTLCache) — installed locally for testing
|
||||
patterns:
|
||||
- "Fresh HKDF instance per call (AlreadyFinalized pitfall avoidance — RESEARCH.md Pitfall 3)"
|
||||
- "DNS-resolved SSRF check: socket.getaddrinfo before ipaddress.ip_network membership test"
|
||||
- "Explicit localhost string block before DNS resolution (OS-agnostic edge case)"
|
||||
- "Fetch-outside-lock async cache pattern: acquire lock to check, release, await fetch_fn, acquire lock to write"
|
||||
- "Lazy import inside get_storage_backend_for_document to avoid circular imports at module load time"
|
||||
|
||||
key-files:
|
||||
created:
|
||||
- backend/storage/cloud_utils.py
|
||||
- backend/services/cloud_cache.py
|
||||
- backend/tests/test_cloud_utils.py
|
||||
modified:
|
||||
- backend/storage/__init__.py
|
||||
|
||||
key-decisions:
|
||||
- "Explicit 'localhost' string block added before DNS resolution — Python 3.9 getaddrinfo resolves localhost to 127.0.0.1 on macOS but behaviour varies by OS; string check is O(1) and OS-agnostic"
|
||||
- "validate_cloud_url test using 8.8.8.8 (raw public IP) instead of cloud.example.com — example.com does not resolve in offline CI environments"
|
||||
- "type: ignore[import] on lazy cloud backend imports — modules do not exist yet (Plans 05-03..05-05 create them)"
|
||||
- "IPv4/IPv6 family mismatch in ip_network check handled via try/except TypeError to avoid cross-family errors"
|
||||
|
||||
patterns-established:
|
||||
- "HKDF key derivation: fresh HKDF(...) instance inside _derive_fernet_key() every call, never cached"
|
||||
- "SSRF validation: scheme check → hostname presence → localhost string → raw IP parse → DNS resolve → blocked network membership"
|
||||
- "Cloud factory extension: get_storage_backend_for_document() alongside (not replacing) get_storage_backend()"
|
||||
- "TTLCache thread safety: threading.Lock wraps all _folder_cache reads/writes; fetch_fn awaited outside lock"
|
||||
|
||||
requirements-completed:
|
||||
- CLOUD-02
|
||||
- CLOUD-07
|
||||
|
||||
# Metrics
|
||||
duration: 18min
|
||||
completed: 2026-05-28
|
||||
---
|
||||
|
||||
# Phase 5 Plan 02: Shared Cloud Utilities Layer Summary
|
||||
|
||||
**SSRF-safe URL validator (RFC-1918/loopback/link-local/localhost/IPv6 blocked via DNS resolution), HKDF-SHA256+Fernet credential encryption with per-user key derivation, TTLCache(1000, 60s) folder listing cache, and async storage backend factory for per-document backend dispatch**
|
||||
|
||||
## Performance
|
||||
|
||||
- **Duration:** 18 min
|
||||
- **Started:** 2026-05-28T19:10:00Z
|
||||
- **Completed:** 2026-05-28T19:28:00Z
|
||||
- **Tasks:** 2
|
||||
- **Files modified:** 4
|
||||
|
||||
## Accomplishments
|
||||
|
||||
- Created `cloud_utils.py` with `validate_cloud_url()` (DNS-resolved SSRF prevention blocking RFC-1918, loopback, link-local, localhost, and IPv6 private ranges), `_derive_fernet_key()` (fresh HKDF instance per call to avoid AlreadyFinalized), `encrypt_credentials()` and `decrypt_credentials()` (Fernet round-trip over JSON-serialised dict)
|
||||
- Created `cloud_cache.py` with module-level `TTLCache(maxsize=1000, ttl=60)` singleton, thread-safe lock, `get_cloud_folders_cached()` async function (fetch-outside-lock pattern), and `invalidate_provider_cache()` sync helper
|
||||
- Extended `storage/__init__.py` with `get_storage_backend_for_document()` async factory: returns MinIOBackend for minio docs, queries CloudConnection scoped to user.id, decrypts credentials, lazy-imports cloud backend classes to avoid circular imports; raises HTTPException(503) if connection missing or inactive
|
||||
- Created `tests/test_cloud_utils.py` with 27 green tests using TDD (RED → GREEN), covering all SSRF cases, HKDF round-trip invariants, TTLCache configuration, async cache behaviour, and factory importability
|
||||
|
||||
## Task Commits
|
||||
|
||||
1. **RED phase tests** - `7fdffdd` (test)
|
||||
2. **Task 1: cloud_utils.py — SSRF validation and HKDF credential encryption** - `976d2ca` (feat)
|
||||
3. **Task 2: cloud_cache.py and storage factory extension** - `fb80379` (feat)
|
||||
|
||||
## Files Created/Modified
|
||||
|
||||
- `/Users/nik/Documents/Progamming/document_scanner/backend/storage/cloud_utils.py` — validate_cloud_url, _derive_fernet_key, encrypt_credentials, decrypt_credentials
|
||||
- `/Users/nik/Documents/Progamming/document_scanner/backend/services/cloud_cache.py` — TTLCache singleton, get_cloud_folders_cached, invalidate_provider_cache
|
||||
- `/Users/nik/Documents/Progamming/document_scanner/backend/storage/__init__.py` — Added get_storage_backend_for_document() async factory alongside existing get_storage_backend()
|
||||
- `/Users/nik/Documents/Progamming/document_scanner/backend/tests/test_cloud_utils.py` — 27 green tests (TDD)
|
||||
|
||||
## Decisions Made
|
||||
|
||||
- Explicit `hostname == "localhost"` string block is added BEFORE DNS resolution. Python's `getaddrinfo("localhost", None)` behaviour varies by OS (macOS resolves to `::1` or `127.0.0.1`; Docker containers sometimes fail), so the string check is more reliable and O(1).
|
||||
- `test_allows_public_https` was written to use `8.8.8.8` (a raw public IP) instead of `cloud.example.com`. The `cloud.example.com` domain does not resolve in offline/sandbox CI environments, causing a spurious test failure unrelated to the SSRF logic being tested.
|
||||
- `# type: ignore[import]` comments added to the lazy imports inside `get_storage_backend_for_document()` because the cloud backend modules (`google_drive_backend.py`, `onedrive_backend.py`, `webdav_backend.py`) do not exist yet — they are created by Plans 05-03 through 05-05.
|
||||
- IPv4/IPv6 family mismatch in `addr in net` is caught via `except TypeError: continue` rather than pre-filtering networks. This is simpler and avoids maintaining two separate network lists.
|
||||
|
||||
## Deviations from Plan
|
||||
|
||||
### Auto-fixed Issues
|
||||
|
||||
**1. [Rule 1 - Bug] Fixed SSRF allow test using unresolvable domain**
|
||||
- **Found during:** Task 1 test execution (GREEN phase)
|
||||
- **Issue:** `test_allows_public_https` used `cloud.example.com` which does not resolve in the local (offline) test environment, causing a spurious ValueError from `socket.gaierror` — not a real SSRF failure
|
||||
- **Fix:** Replaced with `https://8.8.8.8/remote.php/dav` (raw public IP, no DNS required)
|
||||
- **Files modified:** `backend/tests/test_cloud_utils.py`
|
||||
- **Verification:** Test now passes; implementation is correct and not changed
|
||||
- **Committed in:** `976d2ca` (part of Task 1 commit)
|
||||
|
||||
---
|
||||
|
||||
**Total deviations:** 1 auto-fixed (Rule 1 - test bug in network-isolated environment)
|
||||
**Impact on plan:** No scope creep. Fix was a test correctness issue, not an implementation change.
|
||||
|
||||
## Issues Encountered
|
||||
|
||||
`cryptography` and `cachetools` were not installed in the local Python 3.9.6 environment (they were added to `requirements.txt` in Plan 05-01 but not installed locally). Installed both via `pip3 install cryptography cachetools` to enable local test execution. This is consistent with the Plan 05-01 SUMMARY note about running tests locally vs. inside Docker.
|
||||
|
||||
## Known Stubs
|
||||
|
||||
None introduced by this plan. The `# type: ignore[import]` comments on the lazy cloud backend imports in `storage/__init__.py` are expected — those modules are created by Plans 05-03 through 05-05 and will be resolved as those plans complete.
|
||||
|
||||
## Threat Surface Scan
|
||||
|
||||
No new network endpoints introduced. All security surfaces are internal utilities:
|
||||
- `validate_cloud_url()` is a pure validation function (no outbound calls)
|
||||
- `encrypt_credentials()` / `decrypt_credentials()` are pure crypto functions
|
||||
- `get_storage_backend_for_document()` is a factory (no new HTTP endpoints)
|
||||
|
||||
No threat flags raised.
|
||||
|
||||
## Next Phase Readiness
|
||||
|
||||
- All shared utilities are in place. Plans 05-03 through 05-05 can import from `storage.cloud_utils` and `services.cloud_cache` immediately.
|
||||
- `get_storage_backend_for_document()` will work for minio documents now; cloud backends are activated as each backend plan completes.
|
||||
- The 27 new tests in `test_cloud_utils.py` are green; the 19 xfail stubs in `test_cloud.py` remain xfail (correctly — they test API endpoints not yet built).
|
||||
- Full suite: 199 passed / 43 xfailed / 1 pre-existing failure (`test_extract_docx` — python-docx not installed locally, documented in Plan 05-01).
|
||||
|
||||
## Self-Check: PASSED
|
||||
|
||||
Files verified present:
|
||||
- `backend/storage/cloud_utils.py`: FOUND
|
||||
- `backend/services/cloud_cache.py`: FOUND
|
||||
- `backend/storage/__init__.py`: FOUND (contains get_storage_backend_for_document)
|
||||
- `backend/tests/test_cloud_utils.py`: FOUND (27 tests, all passing)
|
||||
|
||||
Commits verified:
|
||||
- 7fdffdd: test(05-02): add failing RED tests — FOUND
|
||||
- 976d2ca: feat(05-02): implement cloud_utils.py — FOUND
|
||||
- fb80379: feat(05-02): implement cloud_cache.py and extend storage factory — FOUND
|
||||
|
||||
---
|
||||
*Phase: 05-cloud-storage-backends*
|
||||
*Completed: 2026-05-28*
|
||||
Reference in New Issue
Block a user