Phase 1 complete: all 5/5 plans executed, walking-skeleton e2e verified live against Docker stack (postgres + minio + redis + backend + celery-worker). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6.6 KiB
plan, phase, status, completed, tasks_total, tasks_complete, requirements_satisfied, self_check
| plan | phase | status | completed | tasks_total | tasks_complete | requirements_satisfied | self_check | ||
|---|---|---|---|---|---|---|---|---|---|
| 01-05 | 01-infrastructure-foundation | complete | 2026-05-22 | 4 | 4 |
|
PASSED |
Plan 01-05 Summary — Lifespan + /health + API Cutover + Celery + Walking Skeleton
What Was Built
Task 1 — Celery app + task + session-aware classifier (commit 32d67de)
backend/celery_app.py — Minimal Celery instance (celery_app = Celery("docuvault")). Reads REDIS_URL directly from os.environ (no config import — Pitfall 7). JSON serialization, documents queue route for tasks.document_tasks.*, autodiscover_tasks(["tasks"]).
backend/tasks/__init__.py — Empty package file.
backend/tasks/document_tasks.py — Sync def extract_and_classify(document_id: str) Celery task (NOT async). Uses asyncio.run(_run(document_id)) to drive the async body. Opens a fresh AsyncSessionLocal session, fetches the Document ORM row, pulls bytes from MinIO via MinIOBackend.get_object, calls extractor.extract_text_from_bytes, persists extracted text, then calls classifier.classify_document(session, doc_id). Non-fatal classification failures set status = "classification_failed".
backend/services/extractor.py — Added extract_text_from_bytes(file_bytes, content_type) helper that writes to a NamedTemporaryFile and delegates to the existing extract_text(path, mime) function.
backend/services/classifier.py — Added session: AsyncSession as first parameter to classify_document and suggest_topics_for_document. All internal storage.* calls updated to pass session.
Task 2 — Lifespan + /health + async API wiring (commit c1931fd)
backend/main.py — Lifespan creates Minio client, auto-creates docuvault bucket if absent (via asyncio.to_thread), attaches to app.state.minio, disposes engine on shutdown. No longer calls ensure_data_dirs(). /health endpoint probes PostgreSQL (SELECT 1 via AsyncSessionLocal) and MinIO (bucket_exists via asyncio.to_thread); returns {"status": "ok"|"degraded", "checks": {"postgres": ..., "minio": ...}}.
backend/api/documents.py — All 5 route handlers inject session: AsyncSession = Depends(get_db). Upload handler: calls await storage.save_upload(session, ...), uses in-memory content bytes for extraction (no filesystem path needed), enqueues extract_and_classify.delay(saved["id"]) for async classification, returns topics: [] immediately. /classify endpoint retains synchronous await classifier.classify_document(session, doc_id) for backward compatibility.
backend/api/topics.py — All 5 route handlers inject session dependency; all storage.* calls are async with session.
Task 3 — Final cutover (commit 970c8e4)
backend/data/— All tracked files removed viagit rm -rf;backend/data/added to.gitignorebackend/config.py— RemovedDATA_DIR,UPLOADS_DIR,METADATA_DIR,TOPICS_FILE,ensure_data_dirs(),import os. RetainedDEFAULT_SETTINGS,DEFAULT_SYSTEM_PROMPT,class Settings(BaseSettings),settings = Settings().SETTINGS_FILErebased asPath(settings.data_dir) / "settings.json"aftersettings = Settings().backend/tests/conftest.py— Removedisolated_data_dirfixture and syncTestClientclientfixture. Promoteddb_sessionandasync_clientfixtures (removedtry/except ImportErrorwrappers — deps now exist). Addedlive_services_availablesession fixture that probes localhost:5432/9000/6379 via socket.backend/tests/test_documents.py— Deleted 9 legacy sync tests. Removed all@pytest.mark.xfailmarkers from async ports.backend/tests/test_health.py— Removed@pytest.mark.xfailfromtest_health_checks_postgres_and_minio. Deleted legacytest_health(client)sync test.backend/tests/test_settings.py,backend/tests/test_topics.py— Updated to remove any remaining sync client references.
Task 4 — Walking-skeleton e2e verification (human-approved ✓)
All 12 verification steps passed:
.envcreated from.env.exampledocker compose down -v— clean statedocker compose up --build -d— all 5 services booteddocker compose ps—postgres,minio,redis,backend,celery-workerallUp (healthy)alembic upgrade head— exit 0,Running upgrade -> 0001/healthresponse:{"status": "ok", "checks": {"postgres": "ok", "minio": "ok"}}- Upload
test.txt— returned{"id": "<uuid>", "original_name": "test.txt", "topics": [], ...} - PostgreSQL confirmed: one row,
object_keystarts withnull-user/ - MinIO confirmed: object present in
docuvaultbucket - Celery confirmed:
Task tasks.document_tasks.extract_and_classify[...] succeeded - Delete confirmed:
{"success": true}, MinIO object removed - Integration tests: zero FAILED, zero XFAIL
ROADMAP.md Phase 1 Success Criteria — All Met
| # | Criterion | Status |
|---|---|---|
| 1 | docker compose up starts all services healthy |
✓ Verified (Task 4, step 4) |
| 2 | alembic upgrade head applies cleanly |
✓ Verified (Plan 03 Task 3 + Task 4 step 5) |
| 3 | Full upload/extract/classify workflow works — no regression | ✓ Verified (Task 4, steps 7-10) |
| 4 | MinIO object key schema {user_id}/{document_id}/{uuid4()}{ext} enforced |
✓ Verified (Plan 04 + Task 4 step 8) |
Deviations
| Rule | Deviation | Resolution |
|---|---|---|
| 1 (Bug) | extract_text_from_bytes helper did not exist in extractor.py |
Added to services/extractor.py as specified in Plan 05 Task 1 action block |
| 2 (Enhancement) | live_services_available uses env var INTEGRATION=1 as fallback |
Socket probe primary, env var secondary — matches plan intent |
Key Files Created/Modified
| File | Status | Notes |
|---|---|---|
| backend/celery_app.py | Created | Minimal Celery — no config import |
| backend/tasks/document_tasks.py | Created | Sync task wrapping asyncio.run |
| backend/tasks/__init__.py | Created | Package marker |
| backend/main.py | Rewritten | Lifespan + /health |
| backend/api/documents.py | Rewritten | Async session injection |
| backend/api/topics.py | Rewritten | Async session injection |
| backend/services/classifier.py | Updated | Session-aware |
| backend/services/extractor.py | Updated | Added bytes helper |
| backend/config.py | Pruned | Flat-file constants removed |
| backend/tests/conftest.py | Pruned | Async-only fixtures |
| backend/tests/test_documents.py | Pruned | Async-only tests |
| backend/data/ | Deleted | D-04 complete |