11 KiB
phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, key-decisions, duration, completed
| phase | plan | subsystem | tags | requires | provides | affects | tech-stack | key-files | key-decisions | duration | completed | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 01-infrastructure-foundation | 04 | storage |
|
|
|
|
|
|
|
~5 minutes | 2026-05-22 |
Phase 1 Plan 04: StorageBackend ABC + MinIO Backend + Async Services Rewrite — Summary
MinIO storage abstraction layer built (StorageBackend ABC + MinIOBackend + get_storage_backend() factory mirroring ai/ provider pattern); services/storage.py fully rewritten as async SQLAlchemy + MinIO orchestrator with filelock removed; all 6 Plan 02 test_storage.py xfail scaffolds flipped to PASSED.
Performance
- Duration: ~5 minutes
- Started: 2026-05-22T07:34:40Z
- Completed: 2026-05-22T07:39:42Z
- Tasks: 2 of 2 complete
- Files created: 3
- Files modified: 3
Accomplishments
Task 1: StorageBackend ABC + MinIOBackend + factory
Created the backend/storage/ module mirroring the backend/ai/ provider pattern:
backend/storage/base.py—StorageBackend(ABC)with 5@abstractmethod async defmethods:put_object,get_object,delete_object,presigned_get_url,health_checkbackend/storage/minio_backend.py—MinIOBackend(StorageBackend)wrapping every synchronous Minio SDK call inasyncio.to_thread(self._client.<method>, ...):put_object: generates STORE-02 key{user_id}/{document_id}/{uuid4()}{ext}, does NOT accept filenameget_object: inner_fetch()helper handles MinIO'sHTTPResponse.read()+close()+release_conn()delete_object: wrapsremove_objectpresigned_get_url: wrapspresigned_get_objectwithtimedelta(minutes=...)health_check: wrapsbucket_existsin try/except, returns bool
backend/storage/__init__.py—get_storage_backend()factory returningMinIOBackend(endpoint=settings.minio_endpoint, ...)withsecure=Falsefor Docker internal HTTP
Task 2: Async services/storage.py rewrite
Rewrote backend/services/storage.py from 188 lines (flat-file + filelock) to 473 lines (async SQLAlchemy + MinIO):
Function signature changes (old → new)
| Function | Old signature | New signature |
|---|---|---|
save_upload |
(file_bytes, original_name, mime_type) -> dict |
async (session, file_bytes, original_name, mime_type) -> dict |
save_metadata |
(meta: dict) -> None |
async (session, meta: dict) -> None |
get_metadata |
(doc_id: str) -> dict|None |
async (session, doc_id: str) -> Optional[dict] |
list_metadata |
(topic=None) -> list[dict] |
async (session, topic=None) -> list |
delete_document |
(doc_id: str) -> bool |
async (session, doc_id: str) -> bool |
update_document_topics |
(doc_id, topics) -> dict|None |
async (session, doc_id, topics) -> Optional[dict] |
remove_topic_from_all_documents |
(topic_name) -> int |
async (session, topic_name) -> int |
load_topics |
() -> list[dict] |
async (session) -> list |
save_topics |
(topics) -> None |
async (session, topics) -> None |
get_topic |
(topic_id) -> dict|None |
async (session, topic_id) -> Optional[dict] |
create_topic |
(name, description, color) -> dict |
async (session, name, description, color) -> dict |
update_topic |
(topic_id, **kwargs) -> dict|None |
async (session, topic_id, name=None, description=None, color=None) -> Optional[dict] |
delete_topic |
(topic_id) -> str|None |
async (session, topic_id) -> Optional[str] |
topic_doc_counts |
() -> dict[str, int] |
async (session) -> dict |
load_settings |
() -> dict |
def () -> dict (unchanged — still sync flat-file) |
save_settings |
(settings) -> None |
def (settings) -> None (unchanged — still sync flat-file) |
mask_api_key |
(key) -> str |
def (key) -> str (unchanged) |
settings_masked |
(settings) -> dict |
def (settings) -> dict (unchanged) |
Key helpers introduced
_backend()— lazy singleton returning theStorageBackendinstance (created once per process)_doc_to_dict(doc, topic_names)— converts aDocumentORM row + topic names list to the legacy response dict shape_load_topic_names(session, doc_id)— async helper loading topic names for a given document UUID
Sample object key for an upload
null-user/550e8400-e29b-41d4-a716-446655440000/f47ac10b-58cc-4372-a567-0e02b2c3d479.pdf
^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
user_id document_id (UUID) freshly-generated uuid4() + extension
(Phase 1 D-03 sentinel)
Task Commits
- Task 1: StorageBackend ABC + MinIOBackend + factory —
eaf86a8(feat) - Task 2: Async services/storage.py rewrite —
3e4b1f1(feat)
Files Created/Modified
| File | Type | Description |
|---|---|---|
backend/storage/base.py |
Created | StorageBackend ABC — 5 abstract async methods |
backend/storage/minio_backend.py |
Created | MinIOBackend — asyncio.to_thread wrapping; STORE-02 key schema |
backend/storage/__init__.py |
Created | get_storage_backend() factory |
backend/services/storage.py |
Rewritten | Async SQLAlchemy + MinIO orchestrator (473 lines) |
backend/tests/test_storage.py |
Modified | Removed xfail markers; fixed test_object_key_schema (removed unused db_session param) |
backend/tests/conftest.py |
Modified | Removed filelock patching; now only patches SETTINGS_FILE |
Deviations from Plan
Auto-fixed Issues
1. [Rule 1 - Bug] Fixed test_object_key_schema: removed unused db_session parameter
- Found during: Task 1 test run
- Issue:
test_object_key_schema(db_session)had adb_sessionparameter it never used. Thedb_sessionfixture tries to create the full PostgreSQL schema (includingaudit_log.ip_address INET) in SQLite, which fails because SQLite has noINETtype. - Fix: Removed the
db_sessionparameter fromtest_object_key_schemasignature — the test body only usesMinIOBackend.__new__with mocks, not DB at all - Files modified:
backend/tests/test_storage.py - Commit:
3e4b1f1
2. [Rule 3 - Blocker] Updated conftest.py to remove filelock patching
- Found during: Task 2 test run
- Issue: conftest.py
isolated_data_dirfixture importedfrom filelock import FileLockand patchedst._topics_lock,st._settings_lock,st.UPLOADS_DIR,st.METADATA_DIR,st.TOPICS_FILEon the storage module. After the rewrite, none of these attributes exist onservices.storage. - Fix: Replaced the 6 filelock
monkeypatch.setattrcalls with a singlemonkeypatch.setattr(st, "SETTINGS_FILE", ...)(the only flat-file attribute still on the module) - Files modified:
backend/tests/conftest.py - Commit:
3e4b1f1
Verification Results
tests/test_storage.py::test_object_key_schema PASSED
tests/test_storage.py::test_filename_not_in_object_key PASSED
tests/test_storage.py::test_storage_backend_abc_methods PASSED
tests/test_storage.py::test_get_storage_backend_returns_minio PASSED
tests/test_storage.py::test_put_object_uses_asyncio_to_thread PASSED
tests/test_storage.py::test_minio_backend_health_check_returns_bool PASSED
6 passed
Known Stubs
save_uploadusesuser_id="null-user"as D-03 sentinel — Phase 2 will replace withstr(current_user.id)after auth landsload_settings/save_settingsremain flat-file backed — Phase 2 migrates tousers.ai_provider/users.ai_modelDB columnsDocument.user_idisNonefor all Phase 1 uploads — Phase 2 migration adds NOT NULL constraint
Threat Flags
None. All surfaces from the plan's threat model are correctly handled:
- T-01-04-01 (object key prediction): key constructed server-side as
{user_id}/{document_id}/{uuid4()}{ext}; no caller can inject a key - T-01-04-02 (filename leaking):
MinIOBackend.put_objecthas nofilenameparameter; teststest_filename_not_in_object_key+test_object_key_schemaenforce this on every CI run - T-01-04-04 (SQL injection): all queries via
select(),delete(),session.execute()with parameterized values; no f-strings in SQL
Next Phase Readiness
Plan 04 delivers the storage abstraction and async service layer. Plan 05 can now:
- Import
save_upload,get_metadata,list_metadata,delete_documentetc. fromservices.storagewith async signatures - Wire
api/documents.pyandapi/topics.pycallers from sync →async def + await + sessionparameter - Wire the MinIO bucket initialization into the FastAPI lifespan
- Replace
BackgroundTaskswith Celery tasks using the async service layer
ROADMAP Phase 1 success criterion #3 (MinIO object key schema enforced in model layer) is now met.
Phase: 01-infrastructure-foundation Completed: 2026-05-22