docs(01-04): complete StorageBackend + MinIO + async storage plan — SUMMARY, STATE, ROADMAP
This commit is contained in:
@@ -30,7 +30,7 @@ _Last updated: 2026-05-21_
|
||||
- [x] 01-01-PLAN.md — Docker Compose service topology + Postgres init + Pydantic Settings + requirements
|
||||
- [x] 01-02-PLAN.md — Wave 0 test scaffolds (xfail/skip stubs) + async pytest fixtures
|
||||
- [x] 01-03-PLAN.md — SQLAlchemy ORM models + async engine + Alembic async migration (incl. alembic upgrade head)
|
||||
- [ ] 01-04-PLAN.md — StorageBackend ABC + MinIO backend + rewritten async services/storage.py
|
||||
- [x] 01-04-PLAN.md — StorageBackend ABC + MinIO backend + rewritten async services/storage.py
|
||||
- [ ] 01-05-PLAN.md — Lifespan + /health + API cutover + Celery worker + walking-skeleton e2e verify
|
||||
|
||||
---
|
||||
@@ -110,7 +110,7 @@ _Last updated: 2026-05-21_
|
||||
|
||||
| Phase | Plans Complete | Status | Completed |
|
||||
|-------|----------------|--------|-----------|
|
||||
| 1. Infrastructure Foundation | 3/5 | In Progress | - |
|
||||
| 1. Infrastructure Foundation | 4/5 | In Progress | - |
|
||||
| 2. Users & Authentication | 0/? | Not started | - |
|
||||
| 3. Document Migration & Multi-User Isolation | 0/? | Not started | - |
|
||||
| 4. Folders, Sharing, Quotas & Document UX | 0/? | Not started | - |
|
||||
|
||||
+14
-10
@@ -4,13 +4,13 @@ milestone: v1.0
|
||||
milestone_name: milestone
|
||||
current_phase: 1
|
||||
status: executing
|
||||
last_updated: "2026-05-22T08:30:00Z"
|
||||
last_updated: "2026-05-22T07:40:00Z"
|
||||
progress:
|
||||
total_phases: 5
|
||||
completed_phases: 0
|
||||
total_plans: 5
|
||||
completed_plans: 3
|
||||
percent: 60
|
||||
completed_plans: 4
|
||||
percent: 80
|
||||
---
|
||||
|
||||
# Project State
|
||||
@@ -24,7 +24,7 @@ progress:
|
||||
|
||||
| Phase | Name | Status |
|
||||
|---|---|---|
|
||||
| 1 | Infrastructure Foundation | In Progress (3/5 plans) |
|
||||
| 1 | Infrastructure Foundation | In Progress (4/5 plans) |
|
||||
| 2 | Users & Authentication | Not Started |
|
||||
| 3 | Document Migration & Multi-User Isolation | Not Started |
|
||||
| 4 | Folders, Sharing, Quotas & Document UX | Not Started |
|
||||
@@ -33,10 +33,10 @@ progress:
|
||||
## Current Position
|
||||
|
||||
Phase: 1 (Infrastructure Foundation) — EXECUTING
|
||||
Plan: 4 of 5
|
||||
Plan: 5 of 5
|
||||
**Phase:** 01-infrastructure-foundation
|
||||
**Plan:** 01-03 COMPLETE → advancing to 01-04
|
||||
**Progress:** ██████░░░░ 60%
|
||||
**Plan:** 01-04 COMPLETE → advancing to 01-05
|
||||
**Progress:** ████████░░ 80%
|
||||
|
||||
## Performance Metrics
|
||||
|
||||
@@ -45,7 +45,7 @@ Plan: 4 of 5
|
||||
| Phases complete | 0 / 5 |
|
||||
| Requirements mapped | 54 / 54 |
|
||||
| Plans written | 5 (Phase 1) |
|
||||
| Plans complete | 3 |
|
||||
| Plans complete | 4 |
|
||||
|
||||
## Accumulated Context
|
||||
|
||||
@@ -70,6 +70,10 @@ Plan: 4 of 5
|
||||
| pydantic-settings v2 SettingsConfigDict | SettingsConfigDict API used (not deprecated class Config form) for env var config |
|
||||
| async_client fixture name | Distinct from legacy sync `client` fixture to avoid collision; both coexist until Plan 05 |
|
||||
| xfail(strict=False) for Wave 0 | All pre-implementation scaffolds use strict=False so unexpected passes don't break CI |
|
||||
| StorageBackend ABC + factory mirrors ai/ pattern | 5 abstract methods; get_storage_backend() factory; MinIOBackend wraps all sync Minio SDK calls in asyncio.to_thread() |
|
||||
| STORE-02 key enforced in code | MinIOBackend.put_object constructs {user_id}/{document_id}/{uuid4()}{ext}; no filename parameter — only extension passes through |
|
||||
| null-user D-03 sentinel | services/storage.save_upload uses user_id="null-user" in Phase 1 (no auth); Phase 2 replaces with str(current_user.id) |
|
||||
| load_settings flat-file Phase 1 | users.ai_provider/ai_model columns cannot be populated until Phase 2; settings remain flat-file JSON for Phase 1 |
|
||||
|
||||
### Open Questions
|
||||
|
||||
@@ -88,6 +92,6 @@ _Updated at each phase transition._
|
||||
|
||||
| Field | Value |
|
||||
|---|---|
|
||||
| Last session | 2026-05-22 — Executed 01-03-PLAN.md (SQLAlchemy ORM + Alembic migration; alembic upgrade head verified) |
|
||||
| Next action | Execute 01-04-PLAN.md (StorageBackend ABC + MinIO backend + async services/storage.py) |
|
||||
| Last session | 2026-05-22 — Executed 01-04-PLAN.md (StorageBackend ABC + MinIOBackend + async services/storage.py; all 6 test_storage.py xfail tests PASSED) |
|
||||
| Next action | Execute 01-05-PLAN.md (Lifespan + /health + API cutover + Celery worker + walking-skeleton e2e verify) |
|
||||
| Pending decisions | See Open Questions above |
|
||||
|
||||
@@ -0,0 +1,207 @@
|
||||
---
|
||||
phase: 01-infrastructure-foundation
|
||||
plan: 04
|
||||
subsystem: storage
|
||||
tags:
|
||||
- storage
|
||||
- minio
|
||||
- sqlalchemy
|
||||
- service-layer
|
||||
- abc
|
||||
|
||||
# Dependency graph
|
||||
requires:
|
||||
- phase: 01-01
|
||||
provides: "Pydantic Settings (config.py) with minio_endpoint / minio_access_key / minio_secret_key / minio_bucket"
|
||||
- phase: 01-02
|
||||
provides: "Wave 0 test scaffolds — test_storage.py with 6 xfail tests this plan unblocks"
|
||||
- phase: 01-03
|
||||
provides: "backend/db/models.py (Document, Topic, DocumentTopic ORM models), backend/db/session.py (AsyncSessionLocal)"
|
||||
provides:
|
||||
- "backend/storage/base.py — StorageBackend ABC with 5 abstract methods"
|
||||
- "backend/storage/minio_backend.py — MinIOBackend wrapping sync Minio SDK in asyncio.to_thread()"
|
||||
- "backend/storage/__init__.py — get_storage_backend() factory"
|
||||
- "backend/services/storage.py — async SQLAlchemy + MinIO orchestrator (14 async + 4 sync functions)"
|
||||
affects:
|
||||
- 01-05
|
||||
- "All subsequent plans that call storage service functions"
|
||||
|
||||
# Tech tracking
|
||||
tech-stack:
|
||||
added:
|
||||
- "minio SDK (Minio client, put_object, get_object, remove_object, presigned_get_object, bucket_exists)"
|
||||
- "asyncio.to_thread() pattern for sync-in-async SDK wrapping (RESEARCH.md Pattern 3)"
|
||||
- "StorageBackend ABC (abc.ABC, @abstractmethod)"
|
||||
patterns:
|
||||
- "ABC + factory pattern mirroring backend/ai/base.py + backend/ai/__init__.py"
|
||||
- "asyncio.to_thread(self._client.<method>, ...) wraps every blocking Minio SDK call"
|
||||
- "Lazy singleton _backend() function for single-process MinIOBackend instance"
|
||||
- "STORE-02 key schema: {user_id}/{document_id}/{uuid4()}{ext} — filename never in key"
|
||||
- "D-03 null-user sentinel: user_id='null-user' in Phase 1 (no auth)"
|
||||
|
||||
key-files:
|
||||
created:
|
||||
- backend/storage/__init__.py
|
||||
- backend/storage/base.py
|
||||
- backend/storage/minio_backend.py
|
||||
modified:
|
||||
- backend/services/storage.py
|
||||
- backend/tests/test_storage.py
|
||||
- backend/tests/conftest.py
|
||||
|
||||
key-decisions:
|
||||
- "StorageBackend ABC has exactly 5 abstract methods: put_object, get_object, delete_object, presigned_get_url, health_check"
|
||||
- "MinIOBackend.put_object takes no filename/original_name parameter — STORE-02 compliance; only extension (from Path.suffix.lower()) passes through"
|
||||
- "get_object uses inner def _fetch() pattern to handle MinIO HTTPResponse.read() + close() + release_conn() inside asyncio.to_thread"
|
||||
- "services/storage.py uses _backend() lazy singleton to avoid creating multiple Minio clients per-request"
|
||||
- "load_settings/save_settings remain sync flat-file in Phase 1; comment documents Phase 2 migration to users.ai_provider/ai_model"
|
||||
- "test_object_key_schema: db_session parameter removed (test does not need DB; SQLite cannot render PostgreSQL INET type during schema creation)"
|
||||
- "conftest.py: filelock patching removed (isolated_data_dir fixture now only patches SETTINGS_FILE for sync settings functions)"
|
||||
|
||||
# Metrics
|
||||
duration: "~5 minutes"
|
||||
completed: "2026-05-22"
|
||||
---
|
||||
|
||||
# Phase 1 Plan 04: StorageBackend ABC + MinIO Backend + Async Services Rewrite — Summary
|
||||
|
||||
**MinIO storage abstraction layer built (StorageBackend ABC + MinIOBackend + get_storage_backend() factory mirroring ai/ provider pattern); services/storage.py fully rewritten as async SQLAlchemy + MinIO orchestrator with filelock removed; all 6 Plan 02 test_storage.py xfail scaffolds flipped to PASSED.**
|
||||
|
||||
## Performance
|
||||
|
||||
- **Duration:** ~5 minutes
|
||||
- **Started:** 2026-05-22T07:34:40Z
|
||||
- **Completed:** 2026-05-22T07:39:42Z
|
||||
- **Tasks:** 2 of 2 complete
|
||||
- **Files created:** 3
|
||||
- **Files modified:** 3
|
||||
|
||||
## Accomplishments
|
||||
|
||||
### Task 1: StorageBackend ABC + MinIOBackend + factory
|
||||
|
||||
Created the `backend/storage/` module mirroring the `backend/ai/` provider pattern:
|
||||
|
||||
- `backend/storage/base.py` — `StorageBackend(ABC)` with 5 `@abstractmethod async def` methods: `put_object`, `get_object`, `delete_object`, `presigned_get_url`, `health_check`
|
||||
- `backend/storage/minio_backend.py` — `MinIOBackend(StorageBackend)` wrapping every synchronous Minio SDK call in `asyncio.to_thread(self._client.<method>, ...)`:
|
||||
- `put_object`: generates STORE-02 key `{user_id}/{document_id}/{uuid4()}{ext}`, does NOT accept filename
|
||||
- `get_object`: inner `_fetch()` helper handles MinIO's `HTTPResponse.read()` + `close()` + `release_conn()`
|
||||
- `delete_object`: wraps `remove_object`
|
||||
- `presigned_get_url`: wraps `presigned_get_object` with `timedelta(minutes=...)`
|
||||
- `health_check`: wraps `bucket_exists` in try/except, returns bool
|
||||
- `backend/storage/__init__.py` — `get_storage_backend()` factory returning `MinIOBackend(endpoint=settings.minio_endpoint, ...)` with `secure=False` for Docker internal HTTP
|
||||
|
||||
### Task 2: Async services/storage.py rewrite
|
||||
|
||||
Rewrote `backend/services/storage.py` from 188 lines (flat-file + filelock) to 473 lines (async SQLAlchemy + MinIO):
|
||||
|
||||
#### Function signature changes (old → new)
|
||||
|
||||
| Function | Old signature | New signature |
|
||||
|----------|--------------|---------------|
|
||||
| `save_upload` | `(file_bytes, original_name, mime_type) -> dict` | `async (session, file_bytes, original_name, mime_type) -> dict` |
|
||||
| `save_metadata` | `(meta: dict) -> None` | `async (session, meta: dict) -> None` |
|
||||
| `get_metadata` | `(doc_id: str) -> dict\|None` | `async (session, doc_id: str) -> Optional[dict]` |
|
||||
| `list_metadata` | `(topic=None) -> list[dict]` | `async (session, topic=None) -> list` |
|
||||
| `delete_document` | `(doc_id: str) -> bool` | `async (session, doc_id: str) -> bool` |
|
||||
| `update_document_topics` | `(doc_id, topics) -> dict\|None` | `async (session, doc_id, topics) -> Optional[dict]` |
|
||||
| `remove_topic_from_all_documents` | `(topic_name) -> int` | `async (session, topic_name) -> int` |
|
||||
| `load_topics` | `() -> list[dict]` | `async (session) -> list` |
|
||||
| `save_topics` | `(topics) -> None` | `async (session, topics) -> None` |
|
||||
| `get_topic` | `(topic_id) -> dict\|None` | `async (session, topic_id) -> Optional[dict]` |
|
||||
| `create_topic` | `(name, description, color) -> dict` | `async (session, name, description, color) -> dict` |
|
||||
| `update_topic` | `(topic_id, **kwargs) -> dict\|None` | `async (session, topic_id, name=None, description=None, color=None) -> Optional[dict]` |
|
||||
| `delete_topic` | `(topic_id) -> str\|None` | `async (session, topic_id) -> Optional[str]` |
|
||||
| `topic_doc_counts` | `() -> dict[str, int]` | `async (session) -> dict` |
|
||||
| `load_settings` | `() -> dict` | `def () -> dict` (unchanged — still sync flat-file) |
|
||||
| `save_settings` | `(settings) -> None` | `def (settings) -> None` (unchanged — still sync flat-file) |
|
||||
| `mask_api_key` | `(key) -> str` | `def (key) -> str` (unchanged) |
|
||||
| `settings_masked` | `(settings) -> dict` | `def (settings) -> dict` (unchanged) |
|
||||
|
||||
#### Key helpers introduced
|
||||
|
||||
- `_backend()` — lazy singleton returning the `StorageBackend` instance (created once per process)
|
||||
- `_doc_to_dict(doc, topic_names)` — converts a `Document` ORM row + topic names list to the legacy response dict shape
|
||||
- `_load_topic_names(session, doc_id)` — async helper loading topic names for a given document UUID
|
||||
|
||||
#### Sample object key for an upload
|
||||
|
||||
```
|
||||
null-user/550e8400-e29b-41d4-a716-446655440000/f47ac10b-58cc-4372-a567-0e02b2c3d479.pdf
|
||||
^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
user_id document_id (UUID) freshly-generated uuid4() + extension
|
||||
(Phase 1 D-03 sentinel)
|
||||
```
|
||||
|
||||
## Task Commits
|
||||
|
||||
1. **Task 1: StorageBackend ABC + MinIOBackend + factory** — `eaf86a8` (feat)
|
||||
2. **Task 2: Async services/storage.py rewrite** — `3e4b1f1` (feat)
|
||||
|
||||
## Files Created/Modified
|
||||
|
||||
| File | Type | Description |
|
||||
|------|------|-------------|
|
||||
| `backend/storage/base.py` | Created | StorageBackend ABC — 5 abstract async methods |
|
||||
| `backend/storage/minio_backend.py` | Created | MinIOBackend — asyncio.to_thread wrapping; STORE-02 key schema |
|
||||
| `backend/storage/__init__.py` | Created | get_storage_backend() factory |
|
||||
| `backend/services/storage.py` | Rewritten | Async SQLAlchemy + MinIO orchestrator (473 lines) |
|
||||
| `backend/tests/test_storage.py` | Modified | Removed xfail markers; fixed test_object_key_schema (removed unused db_session param) |
|
||||
| `backend/tests/conftest.py` | Modified | Removed filelock patching; now only patches SETTINGS_FILE |
|
||||
|
||||
## Deviations from Plan
|
||||
|
||||
### Auto-fixed Issues
|
||||
|
||||
**1. [Rule 1 - Bug] Fixed test_object_key_schema: removed unused db_session parameter**
|
||||
- **Found during:** Task 1 test run
|
||||
- **Issue:** `test_object_key_schema(db_session)` had a `db_session` parameter it never used. The `db_session` fixture tries to create the full PostgreSQL schema (including `audit_log.ip_address INET`) in SQLite, which fails because SQLite has no `INET` type.
|
||||
- **Fix:** Removed the `db_session` parameter from `test_object_key_schema` signature — the test body only uses `MinIOBackend.__new__` with mocks, not DB at all
|
||||
- **Files modified:** `backend/tests/test_storage.py`
|
||||
- **Commit:** `3e4b1f1`
|
||||
|
||||
**2. [Rule 3 - Blocker] Updated conftest.py to remove filelock patching**
|
||||
- **Found during:** Task 2 test run
|
||||
- **Issue:** conftest.py `isolated_data_dir` fixture imported `from filelock import FileLock` and patched `st._topics_lock`, `st._settings_lock`, `st.UPLOADS_DIR`, `st.METADATA_DIR`, `st.TOPICS_FILE` on the storage module. After the rewrite, none of these attributes exist on `services.storage`.
|
||||
- **Fix:** Replaced the 6 filelock `monkeypatch.setattr` calls with a single `monkeypatch.setattr(st, "SETTINGS_FILE", ...)` (the only flat-file attribute still on the module)
|
||||
- **Files modified:** `backend/tests/conftest.py`
|
||||
- **Commit:** `3e4b1f1`
|
||||
|
||||
## Verification Results
|
||||
|
||||
```
|
||||
tests/test_storage.py::test_object_key_schema PASSED
|
||||
tests/test_storage.py::test_filename_not_in_object_key PASSED
|
||||
tests/test_storage.py::test_storage_backend_abc_methods PASSED
|
||||
tests/test_storage.py::test_get_storage_backend_returns_minio PASSED
|
||||
tests/test_storage.py::test_put_object_uses_asyncio_to_thread PASSED
|
||||
tests/test_storage.py::test_minio_backend_health_check_returns_bool PASSED
|
||||
6 passed
|
||||
```
|
||||
|
||||
## Known Stubs
|
||||
|
||||
- `save_upload` uses `user_id="null-user"` as D-03 sentinel — Phase 2 will replace with `str(current_user.id)` after auth lands
|
||||
- `load_settings` / `save_settings` remain flat-file backed — Phase 2 migrates to `users.ai_provider` / `users.ai_model` DB columns
|
||||
- `Document.user_id` is `None` for all Phase 1 uploads — Phase 2 migration adds NOT NULL constraint
|
||||
|
||||
## Threat Flags
|
||||
|
||||
None. All surfaces from the plan's threat model are correctly handled:
|
||||
- T-01-04-01 (object key prediction): key constructed server-side as `{user_id}/{document_id}/{uuid4()}{ext}`; no caller can inject a key
|
||||
- T-01-04-02 (filename leaking): `MinIOBackend.put_object` has no `filename` parameter; tests `test_filename_not_in_object_key` + `test_object_key_schema` enforce this on every CI run
|
||||
- T-01-04-04 (SQL injection): all queries via `select()`, `delete()`, `session.execute()` with parameterized values; no f-strings in SQL
|
||||
|
||||
## Next Phase Readiness
|
||||
|
||||
Plan 04 delivers the storage abstraction and async service layer. Plan 05 can now:
|
||||
- Import `save_upload`, `get_metadata`, `list_metadata`, `delete_document` etc. from `services.storage` with async signatures
|
||||
- Wire `api/documents.py` and `api/topics.py` callers from sync → `async def + await + session` parameter
|
||||
- Wire the MinIO bucket initialization into the FastAPI lifespan
|
||||
- Replace `BackgroundTasks` with Celery tasks using the async service layer
|
||||
|
||||
**ROADMAP Phase 1 success criterion #3** (MinIO object key schema enforced in model layer) is now met.
|
||||
|
||||
---
|
||||
*Phase: 01-infrastructure-foundation*
|
||||
*Completed: 2026-05-22*
|
||||
Reference in New Issue
Block a user