Files

11 KiB

phase, plan, subsystem, tags, requires, provides, affects, tech-stack, key-files, key-decisions, duration, completed
phase plan subsystem tags requires provides affects tech-stack key-files key-decisions duration completed
01-infrastructure-foundation 04 storage
storage
minio
sqlalchemy
service-layer
abc
phase provides
01-01 Pydantic Settings (config.py) with minio_endpoint / minio_access_key / minio_secret_key / minio_bucket
phase provides
01-02 Wave 0 test scaffolds — test_storage.py with 6 xfail tests this plan unblocks
phase provides
01-03 backend/db/models.py (Document, Topic, DocumentTopic ORM models), backend/db/session.py (AsyncSessionLocal)
backend/storage/base.py — StorageBackend ABC with 5 abstract methods
backend/storage/minio_backend.py — MinIOBackend wrapping sync Minio SDK in asyncio.to_thread()
backend/storage/__init__.py — get_storage_backend() factory
backend/services/storage.py — async SQLAlchemy + MinIO orchestrator (14 async + 4 sync functions)
01-05
All subsequent plans that call storage service functions
added patterns
minio SDK (Minio client, put_object, get_object, remove_object, presigned_get_object, bucket_exists)
asyncio.to_thread() pattern for sync-in-async SDK wrapping (RESEARCH.md Pattern 3)
StorageBackend ABC (abc.ABC, @abstractmethod)
ABC + factory pattern mirroring backend/ai/base.py + backend/ai/__init__.py
asyncio.to_thread(self._client.<method>, ...) wraps every blocking Minio SDK call
Lazy singleton _backend() function for single-process MinIOBackend instance
STORE-02 key schema: {user_id}/{document_id}/{uuid4()}{ext} — filename never in key
D-03 null-user sentinel: user_id='null-user' in Phase 1 (no auth)
created modified
backend/storage/__init__.py
backend/storage/base.py
backend/storage/minio_backend.py
backend/services/storage.py
backend/tests/test_storage.py
backend/tests/conftest.py
StorageBackend ABC has exactly 5 abstract methods: put_object, get_object, delete_object, presigned_get_url, health_check
MinIOBackend.put_object takes no filename/original_name parameter — STORE-02 compliance; only extension (from Path.suffix.lower()) passes through
get_object uses inner def _fetch() pattern to handle MinIO HTTPResponse.read() + close() + release_conn() inside asyncio.to_thread
services/storage.py uses _backend() lazy singleton to avoid creating multiple Minio clients per-request
load_settings/save_settings remain sync flat-file in Phase 1; comment documents Phase 2 migration to users.ai_provider/ai_model
test_object_key_schema: db_session parameter removed (test does not need DB; SQLite cannot render PostgreSQL INET type during schema creation)
conftest.py: filelock patching removed (isolated_data_dir fixture now only patches SETTINGS_FILE for sync settings functions)
~5 minutes 2026-05-22

Phase 1 Plan 04: StorageBackend ABC + MinIO Backend + Async Services Rewrite — Summary

MinIO storage abstraction layer built (StorageBackend ABC + MinIOBackend + get_storage_backend() factory mirroring ai/ provider pattern); services/storage.py fully rewritten as async SQLAlchemy + MinIO orchestrator with filelock removed; all 6 Plan 02 test_storage.py xfail scaffolds flipped to PASSED.

Performance

  • Duration: ~5 minutes
  • Started: 2026-05-22T07:34:40Z
  • Completed: 2026-05-22T07:39:42Z
  • Tasks: 2 of 2 complete
  • Files created: 3
  • Files modified: 3

Accomplishments

Task 1: StorageBackend ABC + MinIOBackend + factory

Created the backend/storage/ module mirroring the backend/ai/ provider pattern:

  • backend/storage/base.pyStorageBackend(ABC) with 5 @abstractmethod async def methods: put_object, get_object, delete_object, presigned_get_url, health_check
  • backend/storage/minio_backend.pyMinIOBackend(StorageBackend) wrapping every synchronous Minio SDK call in asyncio.to_thread(self._client.<method>, ...):
    • put_object: generates STORE-02 key {user_id}/{document_id}/{uuid4()}{ext}, does NOT accept filename
    • get_object: inner _fetch() helper handles MinIO's HTTPResponse.read() + close() + release_conn()
    • delete_object: wraps remove_object
    • presigned_get_url: wraps presigned_get_object with timedelta(minutes=...)
    • health_check: wraps bucket_exists in try/except, returns bool
  • backend/storage/__init__.pyget_storage_backend() factory returning MinIOBackend(endpoint=settings.minio_endpoint, ...) with secure=False for Docker internal HTTP

Task 2: Async services/storage.py rewrite

Rewrote backend/services/storage.py from 188 lines (flat-file + filelock) to 473 lines (async SQLAlchemy + MinIO):

Function signature changes (old → new)

Function Old signature New signature
save_upload (file_bytes, original_name, mime_type) -> dict async (session, file_bytes, original_name, mime_type) -> dict
save_metadata (meta: dict) -> None async (session, meta: dict) -> None
get_metadata (doc_id: str) -> dict|None async (session, doc_id: str) -> Optional[dict]
list_metadata (topic=None) -> list[dict] async (session, topic=None) -> list
delete_document (doc_id: str) -> bool async (session, doc_id: str) -> bool
update_document_topics (doc_id, topics) -> dict|None async (session, doc_id, topics) -> Optional[dict]
remove_topic_from_all_documents (topic_name) -> int async (session, topic_name) -> int
load_topics () -> list[dict] async (session) -> list
save_topics (topics) -> None async (session, topics) -> None
get_topic (topic_id) -> dict|None async (session, topic_id) -> Optional[dict]
create_topic (name, description, color) -> dict async (session, name, description, color) -> dict
update_topic (topic_id, **kwargs) -> dict|None async (session, topic_id, name=None, description=None, color=None) -> Optional[dict]
delete_topic (topic_id) -> str|None async (session, topic_id) -> Optional[str]
topic_doc_counts () -> dict[str, int] async (session) -> dict
load_settings () -> dict def () -> dict (unchanged — still sync flat-file)
save_settings (settings) -> None def (settings) -> None (unchanged — still sync flat-file)
mask_api_key (key) -> str def (key) -> str (unchanged)
settings_masked (settings) -> dict def (settings) -> dict (unchanged)

Key helpers introduced

  • _backend() — lazy singleton returning the StorageBackend instance (created once per process)
  • _doc_to_dict(doc, topic_names) — converts a Document ORM row + topic names list to the legacy response dict shape
  • _load_topic_names(session, doc_id) — async helper loading topic names for a given document UUID

Sample object key for an upload

null-user/550e8400-e29b-41d4-a716-446655440000/f47ac10b-58cc-4372-a567-0e02b2c3d479.pdf
^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
user_id    document_id (UUID)                     freshly-generated uuid4() + extension
(Phase 1 D-03 sentinel)

Task Commits

  1. Task 1: StorageBackend ABC + MinIOBackend + factoryeaf86a8 (feat)
  2. Task 2: Async services/storage.py rewrite3e4b1f1 (feat)

Files Created/Modified

File Type Description
backend/storage/base.py Created StorageBackend ABC — 5 abstract async methods
backend/storage/minio_backend.py Created MinIOBackend — asyncio.to_thread wrapping; STORE-02 key schema
backend/storage/__init__.py Created get_storage_backend() factory
backend/services/storage.py Rewritten Async SQLAlchemy + MinIO orchestrator (473 lines)
backend/tests/test_storage.py Modified Removed xfail markers; fixed test_object_key_schema (removed unused db_session param)
backend/tests/conftest.py Modified Removed filelock patching; now only patches SETTINGS_FILE

Deviations from Plan

Auto-fixed Issues

1. [Rule 1 - Bug] Fixed test_object_key_schema: removed unused db_session parameter

  • Found during: Task 1 test run
  • Issue: test_object_key_schema(db_session) had a db_session parameter it never used. The db_session fixture tries to create the full PostgreSQL schema (including audit_log.ip_address INET) in SQLite, which fails because SQLite has no INET type.
  • Fix: Removed the db_session parameter from test_object_key_schema signature — the test body only uses MinIOBackend.__new__ with mocks, not DB at all
  • Files modified: backend/tests/test_storage.py
  • Commit: 3e4b1f1

2. [Rule 3 - Blocker] Updated conftest.py to remove filelock patching

  • Found during: Task 2 test run
  • Issue: conftest.py isolated_data_dir fixture imported from filelock import FileLock and patched st._topics_lock, st._settings_lock, st.UPLOADS_DIR, st.METADATA_DIR, st.TOPICS_FILE on the storage module. After the rewrite, none of these attributes exist on services.storage.
  • Fix: Replaced the 6 filelock monkeypatch.setattr calls with a single monkeypatch.setattr(st, "SETTINGS_FILE", ...) (the only flat-file attribute still on the module)
  • Files modified: backend/tests/conftest.py
  • Commit: 3e4b1f1

Verification Results

tests/test_storage.py::test_object_key_schema PASSED
tests/test_storage.py::test_filename_not_in_object_key PASSED
tests/test_storage.py::test_storage_backend_abc_methods PASSED
tests/test_storage.py::test_get_storage_backend_returns_minio PASSED
tests/test_storage.py::test_put_object_uses_asyncio_to_thread PASSED
tests/test_storage.py::test_minio_backend_health_check_returns_bool PASSED
6 passed

Known Stubs

  • save_upload uses user_id="null-user" as D-03 sentinel — Phase 2 will replace with str(current_user.id) after auth lands
  • load_settings / save_settings remain flat-file backed — Phase 2 migrates to users.ai_provider / users.ai_model DB columns
  • Document.user_id is None for all Phase 1 uploads — Phase 2 migration adds NOT NULL constraint

Threat Flags

None. All surfaces from the plan's threat model are correctly handled:

  • T-01-04-01 (object key prediction): key constructed server-side as {user_id}/{document_id}/{uuid4()}{ext}; no caller can inject a key
  • T-01-04-02 (filename leaking): MinIOBackend.put_object has no filename parameter; tests test_filename_not_in_object_key + test_object_key_schema enforce this on every CI run
  • T-01-04-04 (SQL injection): all queries via select(), delete(), session.execute() with parameterized values; no f-strings in SQL

Next Phase Readiness

Plan 04 delivers the storage abstraction and async service layer. Plan 05 can now:

  • Import save_upload, get_metadata, list_metadata, delete_document etc. from services.storage with async signatures
  • Wire api/documents.py and api/topics.py callers from sync → async def + await + session parameter
  • Wire the MinIO bucket initialization into the FastAPI lifespan
  • Replace BackgroundTasks with Celery tasks using the async service layer

ROADMAP Phase 1 success criterion #3 (MinIO object key schema enforced in model layer) is now met.


Phase: 01-infrastructure-foundation Completed: 2026-05-22