280 Commits

Author SHA1 Message Date
curo1305 9e28de8c15 docs(02): UI design contract for Users & Authentication phase
Specifies form field states, password strength indicator, TOTP enrollment
and backup codes patterns, loading states, error placement, admin table
row states, copywriting (anti-enumeration copy), and full component
inventory for Phase 2 frontend work.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 14:51:28 +02:00
curo1305 5c010587f6 docs(state): record phase 2 context session 2026-05-22 14:33:25 +02:00
curo1305 e0341348f0 docs(02): capture phase context 2026-05-22 14:33:20 +02:00
curo1305 16bb31eb6d docs(01-05): complete walking-skeleton plan — SUMMARY, STATE, ROADMAP
Phase 1 complete: all 5/5 plans executed, walking-skeleton e2e verified
live against Docker stack (postgres + minio + redis + backend + celery-worker).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 14:19:41 +02:00
curo1305 970c8e4e44 feat(01-05): final cutover — delete data/, prune config.py, async-only tests
- Delete backend/data/ tracked files (D-04): flat-file metadata, settings.json,
  topics.json, and uploaded files removed from git; backend/data/ added to
  .gitignore (empty dir remains on macOS due to ACL — no tracked files remain)
- Prune backend/config.py: remove DATA_DIR, UPLOADS_DIR, METADATA_DIR,
  TOPICS_FILE, ensure_data_dirs(); rebase SETTINGS_FILE as derived path from
  settings.data_dir (Phase 1 flat-file settings kept per plan decision)
- Prune backend/tests/conftest.py: remove isolated_data_dir autouse fixture
  and sync TestClient client fixture; add SQLite type compatibility shim
  (visit_INET/JSONB) so in-memory db_session can create tables with
  PostgreSQL-specific column types; add live_services_available fixture
- Rewrite backend/tests/test_documents.py: delete all legacy sync tests,
  remove all @pytest.mark.xfail markers; async-only document tests now
  use async_client + storage service directly for topic wiring
- Rewrite backend/tests/test_health.py: delete legacy sync test_health(client);
  remove @pytest.mark.xfail from test_health_checks_postgres_and_minio
- Port backend/tests/test_topics.py to async_client (sync client removed)
- Port backend/tests/test_settings.py to async_client with monkeypatch for
  SETTINGS_FILE isolation (settings remain flat-file in Phase 1)
2026-05-22 09:53:39 +02:00
curo1305 c1931fd566 feat(01-05): wire main.py lifespan+health and rewrite documents+topics to async session
- Rewrite main.py lifespan: MinIO client created at startup, docuvault bucket
  auto-created if missing, stored on app.state.minio; engine.dispose() on shutdown
- Extend /health endpoint: probes PostgreSQL (SELECT 1) and MinIO (bucket_exists)
  returning {"status": "ok"|"degraded", "checks": {"postgres": ..., "minio": ...}}
- Rewrite api/documents.py: all routes inject session: AsyncSession = Depends(get_db);
  save_upload/save_metadata/list_metadata/get_metadata/delete_document all async;
  upload handler queues extract_and_classify.delay() instead of inline classification;
  /classify endpoint retains synchronous await classifier.classify_document() for
  backward-compatible immediate response
- Rewrite api/topics.py: all routes inject session dependency; all storage calls
  are async with session parameter; Pydantic models TopicCreate/TopicUpdate/
  SuggestRequest preserved verbatim
2026-05-22 09:47:00 +02:00
curo1305 32d67de1ca feat(01-05): introduce celery_app + tasks/document_tasks + session-aware classifier
- Add backend/celery_app.py: Celery("docuvault") with Redis broker, JSON
  serialization, and tasks.document_tasks.* routed to documents queue;
  reads REDIS_URL directly from os.environ (no config import — Pitfall 7)
- Add backend/tasks/__init__.py: empty package marker
- Add backend/tasks/document_tasks.py: sync extract_and_classify Celery task
  that calls asyncio.run(_run()) to retrieve bytes from MinIO, extract text
  via extractor, and classify via classifier; classification failure is non-fatal
- Update backend/services/classifier.py: classify_document and
  suggest_topics_for_document now accept session: AsyncSession as first arg;
  all storage.* calls updated to async session-injection pattern
- Add extract_text_from_bytes helper to services/extractor.py for bytes-based
  extraction (used by Celery worker, which retrieves bytes from MinIO)
2026-05-22 09:45:33 +02:00
curo1305 5d21c6f588 docs(01-04): complete StorageBackend + MinIO + async storage plan — SUMMARY, STATE, ROADMAP 2026-05-22 09:41:43 +02:00
curo1305 3e4b1f1f91 feat(01-04): rewrite services/storage.py as async SQLAlchemy + MinIO orchestrator
- Replaced entire flat-file + filelock implementation with async ORM + MinIO
- All 14 DB-touching functions are async def accepting AsyncSession as first param
- load_settings/save_settings/mask_api_key/settings_masked remain sync (flat-file, Phase 2 will migrate)
- save_upload uses null-user D-03 sentinel; object_key via MinIO put_object
- update_document_topics auto-creates missing topics via create_topic deduplication
- No filelock, no METADATA_DIR/UPLOADS_DIR/TOPICS_FILE references remain
- Added __all__ listing all 18 public functions
- Updated conftest.py: removed filelock patching no longer needed
- Fixed test_object_key_schema: removed unused db_session param (SQLite INET type conflict)
2026-05-22 09:39:32 +02:00
curo1305 eaf86a832a feat(01-04): add StorageBackend ABC + MinIOBackend + factory
- backend/storage/base.py: StorageBackend ABC with 5 abstract methods mirroring ai/base.py
- backend/storage/minio_backend.py: MinIOBackend wrapping all sync Minio SDK calls in asyncio.to_thread(); STORE-02 key schema: {user_id}/{document_id}/{uuid4()}{ext}
- backend/storage/__init__.py: get_storage_backend() factory mirroring ai/__init__.py
- backend/tests/test_storage.py: remove xfail markers (plan 04 implements the module)
2026-05-22 09:36:24 +02:00
curo1305 e822a8f4b1 docs(01-03): complete SQLAlchemy ORM + Alembic plan — SUMMARY, STATE, ROADMAP
- SUMMARY.md: all 11 tables documented, privilege grants, verification results, deviations
- STATE.md: plan counter advanced to 3/5, decisions added, session continuity updated
- ROADMAP.md: 01-03-PLAN.md marked complete, progress table updated to 3/5
2026-05-22 09:33:24 +02:00
curo1305 75ea7ef106 feat(01-03): scaffold Alembic async config and author 0001_initial_schema migration
- backend/alembic.ini: script_location=migrations, sqlalchemy.url=%(DATABASE_MIGRATE_URL)s
- backend/migrations/env.py: async_engine_from_config + Base.metadata wiring;
  runtime os.environ.get("DATABASE_MIGRATE_URL") injection (alembic.ini interpolation
  does not read OS env directly)
- backend/migrations/versions/0001_initial_schema.py: creates all 11 tables in
  dependency order with correct FKs, indexes, and named constraints
- documents.user_id is nullable=True per D-03; Phase 2 adds NOT NULL
- Ends with GRANT + ALTER DEFAULT PRIVILEGES for docuvault_app (Pitfall 4)
- Also grants USAGE/SELECT on sequences (audit_log.id autoincrement)
- downgrade() drops all tables in reverse dependency order
2026-05-22 09:20:49 +02:00
curo1305 3e1fcd69b5 feat(01-03): add full v1 ORM schema, async session factory, and DB dependency
- backend/db/models.py: 11 SQLAlchemy 2.0 ORM models (User, Quota, RefreshToken,
  Folder, Document, Topic, DocumentTopic, Share, AuditLog, CloudConnection, Group)
- Document.user_id declared nullable=True per D-03 (Phase 2 adds NOT NULL)
- AuditLog.metadata_ uses mapped_column("metadata", JSONB) to avoid DeclarativeBase
  reserved-attribute conflict
- Group table stub for D-02 (v2 feature, seeded per PROJECT.md)
- Uses Optional[X] instead of X | None for Python < 3.10 compatibility
- backend/db/session.py: async engine (pool_pre_ping=True, expire_on_commit=False)
- backend/deps/db.py: async get_db() FastAPI dependency yielding AsyncSession
2026-05-22 09:16:21 +02:00
curo1305 213afec6b3 docs(01-02): complete Wave 0 test scaffolds plan — SUMMARY, STATE, ROADMAP
- Create 01-02-SUMMARY.md: 19 total xfail tests across 5 files, 3 task
  commits documented, no deviations
- STATE.md: advance to plan 3/5, update progress to 40%, record decisions
  for async_client naming and xfail(strict=False) pattern
- ROADMAP.md: mark 01-02-PLAN.md complete, update progress table to 2/5
2026-05-22 09:10:27 +02:00
curo1305 d856a2eaa9 test(01-02): extend test_health.py and port test_documents.py to async client
test_health.py:
  - Keep existing test_health(client) sync test unchanged (Plan 01 baseline)
  - Add test_health_checks_postgres_and_minio(async_client) xfail scaffold
    for extended /health response with postgres+minio checks (Plan 05, D-07)

test_documents.py:
  - Keep all 9 existing sync tests verbatim
  - Add async ports (_async suffix) for each: 9 xfail tests using async_client
  - Add test_upload_persists_to_postgres_and_minio_async (UUID id + GET
    round-trip assertion) — xfail until Plan 05 storage rewrite
  - Total: 10 new xfail async tests, 9 sync tests unchanged
2026-05-22 09:08:05 +02:00
curo1305 27fa0d4631 test(01-02): add Wave 0 scaffolds test_storage.py and test_alembic.py
test_storage.py (6 xfail tests, STORE-02):
  - test_object_key_schema: regex {user_id}/{doc_id}/{uuid4}{ext}
  - test_filename_not_in_object_key: human filename never in MinIO key
  - test_storage_backend_abc_methods: incomplete subclass raises TypeError
  - test_get_storage_backend_returns_minio: factory returns MinIOBackend
  - test_put_object_uses_asyncio_to_thread: SDK call wrapped in to_thread
  - test_minio_backend_health_check_returns_bool: True/False on ok/error

test_alembic.py (2 xfail tests, STORE-01 / D-02 / D-03):
  - test_migration_creates_all_tables: all 11 v1 tables after upgrade head
  - test_documents_user_id_nullable: user_id notnull=0 per D-03
2026-05-22 09:06:55 +02:00
curo1305 1f675fcf1a feat(01-02): add async db_session and async_client fixtures to conftest.py
- Add @pytest_asyncio.fixture db_session: in-memory SQLite via aiosqlite,
  expire_on_commit=False, skips gracefully (ImportError) before Plan 03
- Add @pytest_asyncio.fixture async_client: httpx.AsyncClient with
  ASGITransport, overrides deps.db.get_db, skips before Plan 03
- Retain all legacy sync fixtures (isolated_data_dir, client, sample_txt,
  sample_pdf) unchanged for backward compatibility through Plan 04
2026-05-22 09:05:36 +02:00
curo1305 f9b8a0d1ca docs(01-01): complete Compose + Config Foundation plan — SUMMARY, STATE, ROADMAP
- Create 01-01-SUMMARY.md with full execution record (3 tasks, 6 files)
- Update STATE.md: advance to plan 2 of 5, record key decisions, update session
- Update ROADMAP.md: mark 01-01 complete, update progress table (1/5 plans)
2026-05-22 09:01:16 +02:00
curo1305 6c507d5991 feat(01-01): add Pydantic Settings class to config.py and update requirements.txt
- Add Settings(BaseSettings) class reading DATABASE_URL, DATABASE_MIGRATE_URL,
  MINIO_ENDPOINT, MINIO_ACCESS_KEY, MINIO_SECRET_KEY, MINIO_BUCKET, REDIS_URL,
  SECRET_KEY from environment via SettingsConfigDict (pydantic-settings v2 API)
- Instantiate settings = Settings() at module level for all callers
- Preserve legacy DATA_DIR, UPLOADS_DIR, METADATA_DIR, TOPICS_FILE, SETTINGS_FILE,
  DEFAULT_SYSTEM_PROMPT, DEFAULT_SETTINGS, ensure_data_dirs() for Wave 4 compat
- Remove filelock>=3.14 (replaced by PostgreSQL transactions per STORE-07)
- Add sqlalchemy[asyncio]>=2.0.49, psycopg[binary]>=3.3.4, alembic>=1.18.4,
  minio>=7.2.20, celery[redis]>=5.6.3, redis>=7.4.0, aiosqlite>=0.20.0
- Bump pytest-asyncio to >=1.3.0 for asyncio_mode auto support
2026-05-22 08:59:12 +02:00
curo1305 beb55ca871 feat(01-01): extend .env.example with all Phase 1 service variables
- Add PostgreSQL section: DATABASE_URL, DATABASE_MIGRATE_URL, POSTGRES_PASSWORD
- Add MinIO section: MINIO_ROOT_USER, MINIO_ROOT_PASSWORD, MINIO_ENDPOINT, MINIO_ACCESS_KEY, MINIO_SECRET_KEY, MINIO_BUCKET
- Add Redis section: REDIS_PASSWORD, REDIS_URL
- Add Security section: SECRET_KEY (Phase 2 placeholder, documented now)
- All passwords use changeme_* style placeholders matching the init SQL script
- Grouped by service with comment headers per D-11
2026-05-22 08:57:52 +02:00
curo1305 983ecd89b3 feat(01-01): add five-service compose stack and postgres init script
- Rewrite docker-compose.yml with postgres, minio, redis, backend, celery-worker, frontend
- Use postgres:17-alpine, minio/minio:latest, redis:7-alpine with health checks
- backend and celery-worker depend on all three infra services (service_healthy)
- Add docker/postgres/initdb.d/01-init-users.sql to provision docuvault_app and docuvault_migrate
- Remove ./backend/data:/app/data volume mount per D-04
- Add top-level postgres_data and minio_data named volumes
- Add .gitignore to exclude .env from version control (D-11)
2026-05-22 08:57:14 +02:00
curo1305 7a34807fa0 chore: initial commit — existing single-user document scanner codebase
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 08:53:28 +02:00
curo1305 6fed5ba531 docs(01): create phase 1 plan — 5 plans in 4 waves
Research, pattern mapping, and verification complete.
Walking Skeleton mode active (MVP Phase 1).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-22 08:49:36 +02:00
curo1305 a7540d9588 docs(state): record phase 1 context session 2026-05-21 21:31:20 +02:00
curo1305 126ac7cf63 docs(01): capture phase context 2026-05-21 21:31:06 +02:00
curo1305 3353387312 docs: create roadmap (5 phases) 2026-05-21 20:53:28 +02:00
curo1305 7fac328a54 docs: define v1 requirements 2026-05-21 20:47:20 +02:00
curo1305 daa7e0f289 docs: add domain research (4 dimensions + synthesis) 2026-05-21 20:42:16 +02:00
curo1305 2a298a4276 chore: add project config 2026-05-21 19:00:18 +02:00
curo1305 7d96bcc805 docs: initialize project 2026-05-21 18:58:15 +02:00