Files
kite/.planning/phases/01-infrastructure-foundation/01-05-SUMMARY.md
T
curo1305 16bb31eb6d docs(01-05): complete walking-skeleton plan — SUMMARY, STATE, ROADMAP
Phase 1 complete: all 5/5 plans executed, walking-skeleton e2e verified
live against Docker stack (postgres + minio + redis + backend + celery-worker).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 14:19:41 +02:00

6.6 KiB

plan, phase, status, completed, tasks_total, tasks_complete, requirements_satisfied, self_check
plan phase status completed tasks_total tasks_complete requirements_satisfied self_check
01-05 01-infrastructure-foundation complete 2026-05-22 4 4
STORE-01
STORE-07
PASSED

Plan 01-05 Summary — Lifespan + /health + API Cutover + Celery + Walking Skeleton

What Was Built

Task 1 — Celery app + task + session-aware classifier (commit 32d67de)

backend/celery_app.py — Minimal Celery instance (celery_app = Celery("docuvault")). Reads REDIS_URL directly from os.environ (no config import — Pitfall 7). JSON serialization, documents queue route for tasks.document_tasks.*, autodiscover_tasks(["tasks"]).

backend/tasks/__init__.py — Empty package file.

backend/tasks/document_tasks.py — Sync def extract_and_classify(document_id: str) Celery task (NOT async). Uses asyncio.run(_run(document_id)) to drive the async body. Opens a fresh AsyncSessionLocal session, fetches the Document ORM row, pulls bytes from MinIO via MinIOBackend.get_object, calls extractor.extract_text_from_bytes, persists extracted text, then calls classifier.classify_document(session, doc_id). Non-fatal classification failures set status = "classification_failed".

backend/services/extractor.py — Added extract_text_from_bytes(file_bytes, content_type) helper that writes to a NamedTemporaryFile and delegates to the existing extract_text(path, mime) function.

backend/services/classifier.py — Added session: AsyncSession as first parameter to classify_document and suggest_topics_for_document. All internal storage.* calls updated to pass session.

Task 2 — Lifespan + /health + async API wiring (commit c1931fd)

backend/main.py — Lifespan creates Minio client, auto-creates docuvault bucket if absent (via asyncio.to_thread), attaches to app.state.minio, disposes engine on shutdown. No longer calls ensure_data_dirs(). /health endpoint probes PostgreSQL (SELECT 1 via AsyncSessionLocal) and MinIO (bucket_exists via asyncio.to_thread); returns {"status": "ok"|"degraded", "checks": {"postgres": ..., "minio": ...}}.

backend/api/documents.py — All 5 route handlers inject session: AsyncSession = Depends(get_db). Upload handler: calls await storage.save_upload(session, ...), uses in-memory content bytes for extraction (no filesystem path needed), enqueues extract_and_classify.delay(saved["id"]) for async classification, returns topics: [] immediately. /classify endpoint retains synchronous await classifier.classify_document(session, doc_id) for backward compatibility.

backend/api/topics.py — All 5 route handlers inject session dependency; all storage.* calls are async with session.

Task 3 — Final cutover (commit 970c8e4)

  • backend/data/ — All tracked files removed via git rm -rf; backend/data/ added to .gitignore
  • backend/config.py — Removed DATA_DIR, UPLOADS_DIR, METADATA_DIR, TOPICS_FILE, ensure_data_dirs(), import os. Retained DEFAULT_SETTINGS, DEFAULT_SYSTEM_PROMPT, class Settings(BaseSettings), settings = Settings(). SETTINGS_FILE rebased as Path(settings.data_dir) / "settings.json" after settings = Settings().
  • backend/tests/conftest.py — Removed isolated_data_dir fixture and sync TestClient client fixture. Promoted db_session and async_client fixtures (removed try/except ImportError wrappers — deps now exist). Added live_services_available session fixture that probes localhost:5432/9000/6379 via socket.
  • backend/tests/test_documents.py — Deleted 9 legacy sync tests. Removed all @pytest.mark.xfail markers from async ports.
  • backend/tests/test_health.py — Removed @pytest.mark.xfail from test_health_checks_postgres_and_minio. Deleted legacy test_health(client) sync test.
  • backend/tests/test_settings.py, backend/tests/test_topics.py — Updated to remove any remaining sync client references.

Task 4 — Walking-skeleton e2e verification (human-approved ✓)

All 12 verification steps passed:

  1. .env created from .env.example
  2. docker compose down -v — clean state
  3. docker compose up --build -d — all 5 services booted
  4. docker compose pspostgres, minio, redis, backend, celery-worker all Up (healthy)
  5. alembic upgrade head — exit 0, Running upgrade -> 0001
  6. /health response:
    {"status": "ok", "checks": {"postgres": "ok", "minio": "ok"}}
    
  7. Upload test.txt — returned {"id": "<uuid>", "original_name": "test.txt", "topics": [], ...}
  8. PostgreSQL confirmed: one row, object_key starts with null-user/
  9. MinIO confirmed: object present in docuvault bucket
  10. Celery confirmed: Task tasks.document_tasks.extract_and_classify[...] succeeded
  11. Delete confirmed: {"success": true}, MinIO object removed
  12. Integration tests: zero FAILED, zero XFAIL

ROADMAP.md Phase 1 Success Criteria — All Met

# Criterion Status
1 docker compose up starts all services healthy ✓ Verified (Task 4, step 4)
2 alembic upgrade head applies cleanly ✓ Verified (Plan 03 Task 3 + Task 4 step 5)
3 Full upload/extract/classify workflow works — no regression ✓ Verified (Task 4, steps 7-10)
4 MinIO object key schema {user_id}/{document_id}/{uuid4()}{ext} enforced ✓ Verified (Plan 04 + Task 4 step 8)

Deviations

Rule Deviation Resolution
1 (Bug) extract_text_from_bytes helper did not exist in extractor.py Added to services/extractor.py as specified in Plan 05 Task 1 action block
2 (Enhancement) live_services_available uses env var INTEGRATION=1 as fallback Socket probe primary, env var secondary — matches plan intent

Key Files Created/Modified

File Status Notes
backend/celery_app.py Created Minimal Celery — no config import
backend/tasks/document_tasks.py Created Sync task wrapping asyncio.run
backend/tasks/__init__.py Created Package marker
backend/main.py Rewritten Lifespan + /health
backend/api/documents.py Rewritten Async session injection
backend/api/topics.py Rewritten Async session injection
backend/services/classifier.py Updated Session-aware
backend/services/extractor.py Updated Added bytes helper
backend/config.py Pruned Flat-file constants removed
backend/tests/conftest.py Pruned Async-only fixtures
backend/tests/test_documents.py Pruned Async-only tests
backend/data/ Deleted D-04 complete

Self-Check: PASSED