--- plan: 01-05 phase: 01-infrastructure-foundation status: complete completed: "2026-05-22" tasks_total: 4 tasks_complete: 4 requirements_satisfied: - STORE-01 - STORE-07 self_check: PASSED --- # Plan 01-05 Summary — Lifespan + /health + API Cutover + Celery + Walking Skeleton ## What Was Built ### Task 1 — Celery app + task + session-aware classifier (commit 32d67de) **`backend/celery_app.py`** — Minimal Celery instance (`celery_app = Celery("docuvault")`). Reads `REDIS_URL` directly from `os.environ` (no config import — Pitfall 7). JSON serialization, `documents` queue route for `tasks.document_tasks.*`, `autodiscover_tasks(["tasks"])`. **`backend/tasks/__init__.py`** — Empty package file. **`backend/tasks/document_tasks.py`** — Sync `def extract_and_classify(document_id: str)` Celery task (NOT async). Uses `asyncio.run(_run(document_id))` to drive the async body. Opens a fresh `AsyncSessionLocal` session, fetches the Document ORM row, pulls bytes from MinIO via `MinIOBackend.get_object`, calls `extractor.extract_text_from_bytes`, persists extracted text, then calls `classifier.classify_document(session, doc_id)`. Non-fatal classification failures set `status = "classification_failed"`. **`backend/services/extractor.py`** — Added `extract_text_from_bytes(file_bytes, content_type)` helper that writes to a `NamedTemporaryFile` and delegates to the existing `extract_text(path, mime)` function. **`backend/services/classifier.py`** — Added `session: AsyncSession` as first parameter to `classify_document` and `suggest_topics_for_document`. All internal `storage.*` calls updated to pass session. ### Task 2 — Lifespan + /health + async API wiring (commit c1931fd) **`backend/main.py`** — Lifespan creates `Minio` client, auto-creates `docuvault` bucket if absent (via `asyncio.to_thread`), attaches to `app.state.minio`, disposes `engine` on shutdown. No longer calls `ensure_data_dirs()`. `/health` endpoint probes PostgreSQL (`SELECT 1` via `AsyncSessionLocal`) and MinIO (`bucket_exists` via `asyncio.to_thread`); returns `{"status": "ok"|"degraded", "checks": {"postgres": ..., "minio": ...}}`. **`backend/api/documents.py`** — All 5 route handlers inject `session: AsyncSession = Depends(get_db)`. Upload handler: calls `await storage.save_upload(session, ...)`, uses in-memory `content` bytes for extraction (no filesystem path needed), enqueues `extract_and_classify.delay(saved["id"])` for async classification, returns `topics: []` immediately. `/classify` endpoint retains synchronous `await classifier.classify_document(session, doc_id)` for backward compatibility. **`backend/api/topics.py`** — All 5 route handlers inject session dependency; all `storage.*` calls are async with session. ### Task 3 — Final cutover (commit 970c8e4) - `backend/data/` — All tracked files removed via `git rm -rf`; `backend/data/` added to `.gitignore` - `backend/config.py` — Removed `DATA_DIR`, `UPLOADS_DIR`, `METADATA_DIR`, `TOPICS_FILE`, `ensure_data_dirs()`, `import os`. Retained `DEFAULT_SETTINGS`, `DEFAULT_SYSTEM_PROMPT`, `class Settings(BaseSettings)`, `settings = Settings()`. `SETTINGS_FILE` rebased as `Path(settings.data_dir) / "settings.json"` after `settings = Settings()`. - `backend/tests/conftest.py` — Removed `isolated_data_dir` fixture and sync `TestClient` `client` fixture. Promoted `db_session` and `async_client` fixtures (removed `try/except ImportError` wrappers — deps now exist). Added `live_services_available` session fixture that probes localhost:5432/9000/6379 via socket. - `backend/tests/test_documents.py` — Deleted 9 legacy sync tests. Removed all `@pytest.mark.xfail` markers from async ports. - `backend/tests/test_health.py` — Removed `@pytest.mark.xfail` from `test_health_checks_postgres_and_minio`. Deleted legacy `test_health(client)` sync test. - `backend/tests/test_settings.py`, `backend/tests/test_topics.py` — Updated to remove any remaining sync client references. ### Task 4 — Walking-skeleton e2e verification (human-approved ✓) All 12 verification steps passed: 1. `.env` created from `.env.example` 2. `docker compose down -v` — clean state 3. `docker compose up --build -d` — all 5 services booted 4. `docker compose ps` — `postgres`, `minio`, `redis`, `backend`, `celery-worker` all `Up (healthy)` 5. `alembic upgrade head` — exit 0, `Running upgrade -> 0001` 6. `/health` response: ```json {"status": "ok", "checks": {"postgres": "ok", "minio": "ok"}} ``` 7. Upload `test.txt` — returned `{"id": "", "original_name": "test.txt", "topics": [], ...}` 8. PostgreSQL confirmed: one row, `object_key` starts with `null-user/` 9. MinIO confirmed: object present in `docuvault` bucket 10. Celery confirmed: `Task tasks.document_tasks.extract_and_classify[...] succeeded` 11. Delete confirmed: `{"success": true}`, MinIO object removed 12. Integration tests: zero FAILED, zero XFAIL ## ROADMAP.md Phase 1 Success Criteria — All Met | # | Criterion | Status | |---|-----------|--------| | 1 | `docker compose up` starts all services healthy | ✓ Verified (Task 4, step 4) | | 2 | `alembic upgrade head` applies cleanly | ✓ Verified (Plan 03 Task 3 + Task 4 step 5) | | 3 | Full upload/extract/classify workflow works — no regression | ✓ Verified (Task 4, steps 7-10) | | 4 | MinIO object key schema `{user_id}/{document_id}/{uuid4()}{ext}` enforced | ✓ Verified (Plan 04 + Task 4 step 8) | ## Deviations | Rule | Deviation | Resolution | |------|-----------|------------| | 1 (Bug) | `extract_text_from_bytes` helper did not exist in extractor.py | Added to `services/extractor.py` as specified in Plan 05 Task 1 action block | | 2 (Enhancement) | `live_services_available` uses env var `INTEGRATION=1` as fallback | Socket probe primary, env var secondary — matches plan intent | ## Key Files Created/Modified | File | Status | Notes | |------|--------|-------| | backend/celery_app.py | Created | Minimal Celery — no config import | | backend/tasks/document_tasks.py | Created | Sync task wrapping asyncio.run | | backend/tasks/__init__.py | Created | Package marker | | backend/main.py | Rewritten | Lifespan + /health | | backend/api/documents.py | Rewritten | Async session injection | | backend/api/topics.py | Rewritten | Async session injection | | backend/services/classifier.py | Updated | Session-aware | | backend/services/extractor.py | Updated | Added bytes helper | | backend/config.py | Pruned | Flat-file constants removed | | backend/tests/conftest.py | Pruned | Async-only fixtures | | backend/tests/test_documents.py | Pruned | Async-only tests | | backend/data/ | Deleted | D-04 complete | ## Self-Check: PASSED