docs(05-02): complete shared cloud utilities plan
- 05-02-SUMMARY.md: full plan summary with TDD gate compliance, deviation docs, threat surface scan - STATE.md: advanced to plan 26/32 (81%), updated session log, added 4 key decisions - ROADMAP.md: marked 05-02 complete (2/8 Phase 5 plans done)
This commit is contained in:
@@ -227,7 +227,7 @@ Before any phase is marked complete, all three gates must pass:
|
|||||||
|
|
||||||
**Wave 2** — Shared utilities
|
**Wave 2** — Shared utilities
|
||||||
|
|
||||||
- [ ] 05-02-PLAN.md — cloud_utils.py (SSRF + HKDF), cloud_cache.py (TTLCache), storage factory extension
|
- [x] 05-02-PLAN.md — cloud_utils.py (SSRF + HKDF), cloud_cache.py (TTLCache), storage factory extension
|
||||||
|
|
||||||
**Wave 3** — Cloud backends (parallel, both blocked on Wave 2 / Plan 05-02)
|
**Wave 3** — Cloud backends (parallel, both blocked on Wave 2 / Plan 05-02)
|
||||||
|
|
||||||
@@ -268,4 +268,4 @@ Before any phase is marked complete, all three gates must pass:
|
|||||||
| 2. Users & Authentication | 5/5 | Complete | 2026-05-22 |
|
| 2. Users & Authentication | 5/5 | Complete | 2026-05-22 |
|
||||||
| 3. Document Migration & Multi-User Isolation | 5/5 | Complete | 2026-05-25 |
|
| 3. Document Migration & Multi-User Isolation | 5/5 | Complete | 2026-05-25 |
|
||||||
| 4. Folders, Sharing, Quotas & Document UX | 9/9 | Complete | 2026-05-28 |
|
| 4. Folders, Sharing, Quotas & Document UX | 9/9 | Complete | 2026-05-28 |
|
||||||
| 5. Cloud Storage Backends | 1/8 | In Progress| |
|
| 5. Cloud Storage Backends | 2/8 | In Progress| |
|
||||||
|
|||||||
+10
-5
@@ -4,13 +4,13 @@ milestone: v1.0
|
|||||||
milestone_name: milestone
|
milestone_name: milestone
|
||||||
current_phase: 5
|
current_phase: 5
|
||||||
status: executing
|
status: executing
|
||||||
last_updated: "2026-05-28T18:54:16.369Z"
|
last_updated: "2026-05-28T19:30:00.000Z"
|
||||||
progress:
|
progress:
|
||||||
total_phases: 5
|
total_phases: 5
|
||||||
completed_phases: 4
|
completed_phases: 4
|
||||||
total_plans: 32
|
total_plans: 32
|
||||||
completed_plans: 25
|
completed_plans: 26
|
||||||
percent: 78
|
percent: 81
|
||||||
---
|
---
|
||||||
|
|
||||||
# Project State
|
# Project State
|
||||||
@@ -73,6 +73,10 @@ progress:
|
|||||||
| async_client fixture name | Distinct from legacy sync `client` fixture to avoid collision; both coexist until Plan 05 |
|
| async_client fixture name | Distinct from legacy sync `client` fixture to avoid collision; both coexist until Plan 05 |
|
||||||
| xfail(strict=False) for Wave 0 | All pre-implementation scaffolds use strict=False so unexpected passes don't break CI |
|
| xfail(strict=False) for Wave 0 | All pre-implementation scaffolds use strict=False so unexpected passes don't break CI |
|
||||||
| StorageBackend ABC + factory mirrors ai/ pattern | 5 abstract methods; get_storage_backend() factory; MinIOBackend wraps all sync Minio SDK calls in asyncio.to_thread() |
|
| StorageBackend ABC + factory mirrors ai/ pattern | 5 abstract methods; get_storage_backend() factory; MinIOBackend wraps all sync Minio SDK calls in asyncio.to_thread() |
|
||||||
|
| Explicit localhost string block in validate_cloud_url | hostname == "localhost" blocked before DNS resolution — OS-agnostic (getaddrinfo("localhost") behaviour varies by OS) |
|
||||||
|
| Fresh HKDF instance per _derive_fernet_key call | cryptography library raises AlreadyFinalized on 2nd .derive() call; always create new HKDF(...) instance — never cache |
|
||||||
|
| Lazy import of cloud backends in get_storage_backend_for_document | Avoids circular imports at module load time; backends imported inside function body with type: ignore[import] until Plans 05-03..05-05 create them |
|
||||||
|
| Fetch-outside-lock async cache pattern | get_cloud_folders_cached acquires lock to check cache, releases lock, awaits fetch_fn, re-acquires lock to write — prevents event loop blocking on cache miss |
|
||||||
| STORE-02 key enforced in code | MinIOBackend.put_object constructs {user_id}/{document_id}/{uuid4()}{ext}; no filename parameter — only extension passes through |
|
| STORE-02 key enforced in code | MinIOBackend.put_object constructs {user_id}/{document_id}/{uuid4()}{ext}; no filename parameter — only extension passes through |
|
||||||
| null-user D-03 sentinel | services/storage.save_upload uses user_id="null-user" in Phase 1 (no auth); Phase 2 replaces with str(current_user.id) |
|
| null-user D-03 sentinel | services/storage.save_upload uses user_id="null-user" in Phase 1 (no auth); Phase 2 replaces with str(current_user.id) |
|
||||||
| load_settings flat-file Phase 1 | users.ai_provider/ai_model columns cannot be populated until Phase 2; settings remain flat-file JSON for Phase 1 |
|
| load_settings flat-file Phase 1 | users.ai_provider/ai_model columns cannot be populated until Phase 2; settings remain flat-file JSON for Phase 1 |
|
||||||
@@ -164,6 +168,7 @@ _Updated at each phase transition._
|
|||||||
| Last session | 2026-05-28 — Phase 5 UI-SPEC approved (6/6 dimensions passed; 2 revision rounds: Cancel label → context-specific, text-lg → text-xl) |
|
| Last session | 2026-05-28 — Phase 5 UI-SPEC approved (6/6 dimensions passed; 2 revision rounds: Cancel label → context-specific, text-lg → text-xl) |
|
||||||
| Last session | 2026-05-28 — Phase 5 planned (8 plans, 7 waves); verification passed (4 blockers → resolved: D-05 API-layer refresh path, SEC-09 cloud cleanup, frontend_url config, RESEARCH resolved markers) |
|
| Last session | 2026-05-28 — Phase 5 planned (8 plans, 7 waves); verification passed (4 blockers → resolved: D-05 API-layer refresh path, SEC-09 cloud cleanup, frontend_url config, RESEARCH resolved markers) |
|
||||||
| Last session | 2026-05-28 — Plan 05-01 executed: Wave 0 Nyquist scaffold — 19 xfail stubs in test_cloud.py, 4 cloud fixtures in conftest.py, 6 package pins, 8 config settings; 172 passed / 43 xfailed |
|
| Last session | 2026-05-28 — Plan 05-01 executed: Wave 0 Nyquist scaffold — 19 xfail stubs in test_cloud.py, 4 cloud fixtures in conftest.py, 6 package pins, 8 config settings; 172 passed / 43 xfailed |
|
||||||
| Next action | Execute Plan 05-02: HKDF cloud credential encryption (cloud_utils.py) |
|
| Last session | 2026-05-28 — Plan 05-02 executed: cloud_utils.py (SSRF+HKDF), cloud_cache.py (TTLCache), storage factory extended; 199 passed / 43 xfailed / 1 pre-existing failure |
|
||||||
|
| Next action | Execute Plan 05-03: GoogleDriveBackend + OneDriveBackend (all 7 StorageBackend methods) |
|
||||||
| Pending decisions | None |
|
| Pending decisions | None |
|
||||||
| Resume file | `.planning/phases/05-cloud-storage-backends/05-02-PLAN.md` |
|
| Resume file | `.planning/phases/05-cloud-storage-backends/05-03-PLAN.md` |
|
||||||
|
|||||||
@@ -0,0 +1,155 @@
|
|||||||
|
---
|
||||||
|
phase: 05-cloud-storage-backends
|
||||||
|
plan: 02
|
||||||
|
subsystem: api
|
||||||
|
tags: [cryptography, hkdf, fernet, ssrf, ipaddress, cachetools, ttlcache, cloud-storage, storage-factory]
|
||||||
|
|
||||||
|
# Dependency graph
|
||||||
|
requires:
|
||||||
|
- phase: 05-cloud-storage-backends
|
||||||
|
plan: 01
|
||||||
|
provides: "Wave 0 xfail stubs in test_cloud.py, cloud Settings fields (cloud_creds_key), cachetools pin in requirements.txt"
|
||||||
|
provides:
|
||||||
|
- "backend/storage/cloud_utils.py: validate_cloud_url (SSRF), encrypt_credentials, decrypt_credentials, _derive_fernet_key (HKDF)"
|
||||||
|
- "backend/services/cloud_cache.py: TTLCache(maxsize=1000,ttl=60) singleton, get_cloud_folders_cached (async), invalidate_provider_cache"
|
||||||
|
- "backend/storage/__init__.py: get_storage_backend_for_document() async factory"
|
||||||
|
- "backend/tests/test_cloud_utils.py: 27 green tests covering SSRF, HKDF round-trip, cache, factory"
|
||||||
|
affects: [05-03, 05-04, 05-05, 05-06, 05-07, 05-08]
|
||||||
|
|
||||||
|
# Tech tracking
|
||||||
|
tech-stack:
|
||||||
|
added:
|
||||||
|
- cryptography (HKDF-SHA256, Fernet AES-256-GCM) — installed locally for testing
|
||||||
|
- cachetools (TTLCache) — installed locally for testing
|
||||||
|
patterns:
|
||||||
|
- "Fresh HKDF instance per call (AlreadyFinalized pitfall avoidance — RESEARCH.md Pitfall 3)"
|
||||||
|
- "DNS-resolved SSRF check: socket.getaddrinfo before ipaddress.ip_network membership test"
|
||||||
|
- "Explicit localhost string block before DNS resolution (OS-agnostic edge case)"
|
||||||
|
- "Fetch-outside-lock async cache pattern: acquire lock to check, release, await fetch_fn, acquire lock to write"
|
||||||
|
- "Lazy import inside get_storage_backend_for_document to avoid circular imports at module load time"
|
||||||
|
|
||||||
|
key-files:
|
||||||
|
created:
|
||||||
|
- backend/storage/cloud_utils.py
|
||||||
|
- backend/services/cloud_cache.py
|
||||||
|
- backend/tests/test_cloud_utils.py
|
||||||
|
modified:
|
||||||
|
- backend/storage/__init__.py
|
||||||
|
|
||||||
|
key-decisions:
|
||||||
|
- "Explicit 'localhost' string block added before DNS resolution — Python 3.9 getaddrinfo resolves localhost to 127.0.0.1 on macOS but behaviour varies by OS; string check is O(1) and OS-agnostic"
|
||||||
|
- "validate_cloud_url test using 8.8.8.8 (raw public IP) instead of cloud.example.com — example.com does not resolve in offline CI environments"
|
||||||
|
- "type: ignore[import] on lazy cloud backend imports — modules do not exist yet (Plans 05-03..05-05 create them)"
|
||||||
|
- "IPv4/IPv6 family mismatch in ip_network check handled via try/except TypeError to avoid cross-family errors"
|
||||||
|
|
||||||
|
patterns-established:
|
||||||
|
- "HKDF key derivation: fresh HKDF(...) instance inside _derive_fernet_key() every call, never cached"
|
||||||
|
- "SSRF validation: scheme check → hostname presence → localhost string → raw IP parse → DNS resolve → blocked network membership"
|
||||||
|
- "Cloud factory extension: get_storage_backend_for_document() alongside (not replacing) get_storage_backend()"
|
||||||
|
- "TTLCache thread safety: threading.Lock wraps all _folder_cache reads/writes; fetch_fn awaited outside lock"
|
||||||
|
|
||||||
|
requirements-completed:
|
||||||
|
- CLOUD-02
|
||||||
|
- CLOUD-07
|
||||||
|
|
||||||
|
# Metrics
|
||||||
|
duration: 18min
|
||||||
|
completed: 2026-05-28
|
||||||
|
---
|
||||||
|
|
||||||
|
# Phase 5 Plan 02: Shared Cloud Utilities Layer Summary
|
||||||
|
|
||||||
|
**SSRF-safe URL validator (RFC-1918/loopback/link-local/localhost/IPv6 blocked via DNS resolution), HKDF-SHA256+Fernet credential encryption with per-user key derivation, TTLCache(1000, 60s) folder listing cache, and async storage backend factory for per-document backend dispatch**
|
||||||
|
|
||||||
|
## Performance
|
||||||
|
|
||||||
|
- **Duration:** 18 min
|
||||||
|
- **Started:** 2026-05-28T19:10:00Z
|
||||||
|
- **Completed:** 2026-05-28T19:28:00Z
|
||||||
|
- **Tasks:** 2
|
||||||
|
- **Files modified:** 4
|
||||||
|
|
||||||
|
## Accomplishments
|
||||||
|
|
||||||
|
- Created `cloud_utils.py` with `validate_cloud_url()` (DNS-resolved SSRF prevention blocking RFC-1918, loopback, link-local, localhost, and IPv6 private ranges), `_derive_fernet_key()` (fresh HKDF instance per call to avoid AlreadyFinalized), `encrypt_credentials()` and `decrypt_credentials()` (Fernet round-trip over JSON-serialised dict)
|
||||||
|
- Created `cloud_cache.py` with module-level `TTLCache(maxsize=1000, ttl=60)` singleton, thread-safe lock, `get_cloud_folders_cached()` async function (fetch-outside-lock pattern), and `invalidate_provider_cache()` sync helper
|
||||||
|
- Extended `storage/__init__.py` with `get_storage_backend_for_document()` async factory: returns MinIOBackend for minio docs, queries CloudConnection scoped to user.id, decrypts credentials, lazy-imports cloud backend classes to avoid circular imports; raises HTTPException(503) if connection missing or inactive
|
||||||
|
- Created `tests/test_cloud_utils.py` with 27 green tests using TDD (RED → GREEN), covering all SSRF cases, HKDF round-trip invariants, TTLCache configuration, async cache behaviour, and factory importability
|
||||||
|
|
||||||
|
## Task Commits
|
||||||
|
|
||||||
|
1. **RED phase tests** - `7fdffdd` (test)
|
||||||
|
2. **Task 1: cloud_utils.py — SSRF validation and HKDF credential encryption** - `976d2ca` (feat)
|
||||||
|
3. **Task 2: cloud_cache.py and storage factory extension** - `fb80379` (feat)
|
||||||
|
|
||||||
|
## Files Created/Modified
|
||||||
|
|
||||||
|
- `/Users/nik/Documents/Progamming/document_scanner/backend/storage/cloud_utils.py` — validate_cloud_url, _derive_fernet_key, encrypt_credentials, decrypt_credentials
|
||||||
|
- `/Users/nik/Documents/Progamming/document_scanner/backend/services/cloud_cache.py` — TTLCache singleton, get_cloud_folders_cached, invalidate_provider_cache
|
||||||
|
- `/Users/nik/Documents/Progamming/document_scanner/backend/storage/__init__.py` — Added get_storage_backend_for_document() async factory alongside existing get_storage_backend()
|
||||||
|
- `/Users/nik/Documents/Progamming/document_scanner/backend/tests/test_cloud_utils.py` — 27 green tests (TDD)
|
||||||
|
|
||||||
|
## Decisions Made
|
||||||
|
|
||||||
|
- Explicit `hostname == "localhost"` string block is added BEFORE DNS resolution. Python's `getaddrinfo("localhost", None)` behaviour varies by OS (macOS resolves to `::1` or `127.0.0.1`; Docker containers sometimes fail), so the string check is more reliable and O(1).
|
||||||
|
- `test_allows_public_https` was written to use `8.8.8.8` (a raw public IP) instead of `cloud.example.com`. The `cloud.example.com` domain does not resolve in offline/sandbox CI environments, causing a spurious test failure unrelated to the SSRF logic being tested.
|
||||||
|
- `# type: ignore[import]` comments added to the lazy imports inside `get_storage_backend_for_document()` because the cloud backend modules (`google_drive_backend.py`, `onedrive_backend.py`, `webdav_backend.py`) do not exist yet — they are created by Plans 05-03 through 05-05.
|
||||||
|
- IPv4/IPv6 family mismatch in `addr in net` is caught via `except TypeError: continue` rather than pre-filtering networks. This is simpler and avoids maintaining two separate network lists.
|
||||||
|
|
||||||
|
## Deviations from Plan
|
||||||
|
|
||||||
|
### Auto-fixed Issues
|
||||||
|
|
||||||
|
**1. [Rule 1 - Bug] Fixed SSRF allow test using unresolvable domain**
|
||||||
|
- **Found during:** Task 1 test execution (GREEN phase)
|
||||||
|
- **Issue:** `test_allows_public_https` used `cloud.example.com` which does not resolve in the local (offline) test environment, causing a spurious ValueError from `socket.gaierror` — not a real SSRF failure
|
||||||
|
- **Fix:** Replaced with `https://8.8.8.8/remote.php/dav` (raw public IP, no DNS required)
|
||||||
|
- **Files modified:** `backend/tests/test_cloud_utils.py`
|
||||||
|
- **Verification:** Test now passes; implementation is correct and not changed
|
||||||
|
- **Committed in:** `976d2ca` (part of Task 1 commit)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Total deviations:** 1 auto-fixed (Rule 1 - test bug in network-isolated environment)
|
||||||
|
**Impact on plan:** No scope creep. Fix was a test correctness issue, not an implementation change.
|
||||||
|
|
||||||
|
## Issues Encountered
|
||||||
|
|
||||||
|
`cryptography` and `cachetools` were not installed in the local Python 3.9.6 environment (they were added to `requirements.txt` in Plan 05-01 but not installed locally). Installed both via `pip3 install cryptography cachetools` to enable local test execution. This is consistent with the Plan 05-01 SUMMARY note about running tests locally vs. inside Docker.
|
||||||
|
|
||||||
|
## Known Stubs
|
||||||
|
|
||||||
|
None introduced by this plan. The `# type: ignore[import]` comments on the lazy cloud backend imports in `storage/__init__.py` are expected — those modules are created by Plans 05-03 through 05-05 and will be resolved as those plans complete.
|
||||||
|
|
||||||
|
## Threat Surface Scan
|
||||||
|
|
||||||
|
No new network endpoints introduced. All security surfaces are internal utilities:
|
||||||
|
- `validate_cloud_url()` is a pure validation function (no outbound calls)
|
||||||
|
- `encrypt_credentials()` / `decrypt_credentials()` are pure crypto functions
|
||||||
|
- `get_storage_backend_for_document()` is a factory (no new HTTP endpoints)
|
||||||
|
|
||||||
|
No threat flags raised.
|
||||||
|
|
||||||
|
## Next Phase Readiness
|
||||||
|
|
||||||
|
- All shared utilities are in place. Plans 05-03 through 05-05 can import from `storage.cloud_utils` and `services.cloud_cache` immediately.
|
||||||
|
- `get_storage_backend_for_document()` will work for minio documents now; cloud backends are activated as each backend plan completes.
|
||||||
|
- The 27 new tests in `test_cloud_utils.py` are green; the 19 xfail stubs in `test_cloud.py` remain xfail (correctly — they test API endpoints not yet built).
|
||||||
|
- Full suite: 199 passed / 43 xfailed / 1 pre-existing failure (`test_extract_docx` — python-docx not installed locally, documented in Plan 05-01).
|
||||||
|
|
||||||
|
## Self-Check: PASSED
|
||||||
|
|
||||||
|
Files verified present:
|
||||||
|
- `backend/storage/cloud_utils.py`: FOUND
|
||||||
|
- `backend/services/cloud_cache.py`: FOUND
|
||||||
|
- `backend/storage/__init__.py`: FOUND (contains get_storage_backend_for_document)
|
||||||
|
- `backend/tests/test_cloud_utils.py`: FOUND (27 tests, all passing)
|
||||||
|
|
||||||
|
Commits verified:
|
||||||
|
- 7fdffdd: test(05-02): add failing RED tests — FOUND
|
||||||
|
- 976d2ca: feat(05-02): implement cloud_utils.py — FOUND
|
||||||
|
- fb80379: feat(05-02): implement cloud_cache.py and extend storage factory — FOUND
|
||||||
|
|
||||||
|
---
|
||||||
|
*Phase: 05-cloud-storage-backends*
|
||||||
|
*Completed: 2026-05-28*
|
||||||
Reference in New Issue
Block a user