Commit Graph

13 Commits

Author SHA1 Message Date
curo1305 54ef3357ba fix(05): cloud API path param, root sentinel, webdav creds in list, upload path
cloud.py: list_connections now decrypts and surfaces server_url +
connection_username for nextcloud/webdav providers; folder route uses
{folder_id:path} to handle slashes; translates "root" sentinel to "".
nextcloud_backend.py: skip parent directory entry in PROPFIND Depth:1 results.
webdav_backend.py: add cloud_folder + original_filename params to
upload_object so files land in the user's chosen folder with their real name.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 11:58:01 +02:00
curo1305 a9ea33dd18 feat(05-04): fix storage factory to dispatch nextcloud to NextcloudBackend
- Previously both 'nextcloud' and 'webdav' providers were dispatched to WebDAVBackend
- Now 'nextcloud' uses NextcloudBackend (has list_folder); 'webdav' uses WebDAVBackend
- Both share identical constructor signature (server_url, username, password)
- Removes type: ignore[import] concern on nextcloud_backend — module now exists
2026-05-28 21:12:27 +02:00
curo1305 1b9573f398 feat(05-04): implement NextcloudBackend extending WebDAVBackend
- NextcloudBackend subclasses WebDAVBackend; inherits all 7 StorageBackend methods
- SSRF guard fully inherited: NextcloudBackend("http://10.0.0.1/dav", ...) raises ValueError
- stores self._username for Nextcloud path convention context
- list_folder(folder_path: str = "") async method added — lists via client.list() +
  client.info() wrapped in asyncio.to_thread(), returns [{id, name, is_dir, size}, ...]
- validate_cloud_url called before every asyncio.to_thread() call in list_folder (D-17)
- health_check overrides parent to use client.check("") for Nextcloud root probe
2026-05-28 21:11:12 +02:00
curo1305 bcb887e61d feat(05-03): implement OneDriveBackend — Microsoft Graph StorageBackend
- CloudConnectionError imported from google_drive_backend (shared exception type)
- CHUNK_SIZE = 10 * 1024 * 1024 (10 MB — above Graph 4 MB limit, Pitfall 6)
- All 7 StorageBackend methods implemented as async coroutines
- Resumable upload sessions (createUploadSession) used for ALL uploads
- _ensure_valid_token() checks expiry with 60s buffer, calls _refresh_token() if expired
- _refresh_token() wraps msal.ConfidentialClientApplication in asyncio.to_thread()
- invalid_grant → CloudConnectionError(reason='invalid_grant') per D-06 / B2 design
- presigned_get_url and generate_presigned_put_url raise NotImplementedError (D-14)
- delete_object silently ignores 404 (no-op per StorageBackend contract)
- Backend is stateless — no DB writes (B2 design)
2026-05-28 21:10:56 +02:00
curo1305 311dfa1513 feat(05-04): implement WebDAVBackend with SSRF guard and asyncio wrapping
- All 7 StorageBackend methods implemented as async coroutines
- validate_cloud_url() called in __init__ (SSRF at construct time) and before
  every asyncio.to_thread() call (D-17 defense-in-depth / T-05-04-01, T-05-04-02)
- _make_path() builds "docuvault/{user_id}/{document_id}{ext}" with urllib.parse.quote
  encoding on path segments (RESEARCH.md Pitfall 2)
- presigned_get_url and generate_presigned_put_url raise NotImplementedError (D-14)
- All webdavclient3 sync calls (upload_to, download_from, clean, info, check, mkdir)
  wrapped in asyncio.to_thread() per MinIOBackend pattern
- delete_object silently ignores missing file exceptions (StorageBackend ABC contract)
2026-05-28 21:09:25 +02:00
curo1305 337ee8ef11 feat(05-03): implement GoogleDriveBackend — Google Drive v3 StorageBackend
- CloudConnectionError(reason=) defined in this module — token_expired | invalid_grant
- All 7 StorageBackend methods implemented as async coroutines
- Every sync googleapiclient call wrapped in asyncio.to_thread() (Pitfall 7)
- cache_discovery=False on build() prevents /tmp directory traversal (T-05-03-05)
- presigned_get_url and generate_presigned_put_url raise NotImplementedError (D-14)
- HttpError 401 raises CloudConnectionError(reason='token_expired')
- HttpError 400 with 'invalid_grant' raises CloudConnectionError(reason='invalid_grant')
- HttpError 404 on delete_object is silently swallowed (no-op per contract)
- Backend is stateless — no DB writes (B2 design, D-05/D-06)
2026-05-28 21:07:26 +02:00
curo1305 fb803795fa feat(05-02): implement cloud_cache.py and extend storage factory
- cloud_cache.py: module-level TTLCache(maxsize=1000, ttl=60) singleton with
  threading.Lock for concurrent access safety (RESEARCH.md Pattern 8 / D-16)
- get_cloud_folders_cached(): async function; calls fetch_fn OUTSIDE the lock
  to avoid blocking the event loop during cloud API calls
- invalidate_provider_cache(): removes all cache entries for a user+provider prefix
- storage/__init__.py: adds get_storage_backend_for_document() async factory
  — returns MinIOBackend for minio docs; queries CloudConnection (scoped to user.id),
  decrypts credentials, and lazy-imports cloud backends to avoid circular imports
  — raises HTTPException(503) if connection missing or not ACTIVE (T-05-02-04)
2026-05-28 21:00:48 +02:00
curo1305 976d2ca2de feat(05-02): implement cloud_utils.py — SSRF validation and HKDF credential encryption
- validate_cloud_url(): blocks RFC-1918 (10.x, 172.16.x, 192.168.x), loopback (127.x),
  link-local (169.254.x), IPv6 loopback (::1), ULA (fc00::/7), and 'localhost' string;
  resolves DNS via socket.getaddrinfo BEFORE IP check (anti-DNS-rebinding per D-17)
- _derive_fernet_key(): creates fresh HKDF-SHA256 instance per call (AlreadyFinalized
  pitfall avoided per RESEARCH.md Pitfall 3); uses user_id as salt for per-user isolation
- encrypt_credentials(): Fernet-encrypts JSON-serialised credentials dict; returns str
- decrypt_credentials(): decrypts Fernet token back to original dict
- [Rule 1 - Bug] Fixed test_allows_public_https to use 8.8.8.8 IP (cloud.example.com
  does not resolve in offline CI environments)
2026-05-28 20:58:40 +02:00
curo1305 b6bab5a230 feat(phase-4): Alembic migration 0004 (pdf_open_mode, GIN FTS index, audit-logs bucket) + MinIOBackend.put_object_raw()
- Add users.pdf_open_mode column via batch_alter_table (server_default='in_app')
- Create GIN expression index ix_documents_fts on documents.extracted_text via raw SQL (Alembic #1390)
- Create audit-logs MinIO bucket gated on MINIO_ENDPOINT env var
- Add MinIOBackend.put_object_raw() for caller-supplied bucket+key uploads (audit CSV export)
2026-05-25 18:30:28 +02:00
curo1305 a5f202b069 Fix Phase 3 UAT blockers: MinIO presigned URL hostname, CORS, admin flush→commit, auth refresh race
Bugs fixed:
- minio_backend.py: generate_presigned_put_url and presigned_get_url used internal
  _client (minio:9000) instead of _public_client (localhost:9000). Browser received
  ERR_NAME_NOT_RESOLVED. Fixed by using _public_client with region='us-east-1' to
  skip region-discovery HTTP request from inside the container.

- docker-compose.yml: MINIO_API_CORS_ALLOW_ORIGIN was set from CORS_ORIGINS which
  uses pydantic JSON list format '["http://localhost:5173"]'. MinIO expected a plain
  string and never matched the origin. Fixed to use FRONTEND_URL instead.

- admin.py: All write handlers (create_user, update_user_status, update_user_quota,
  update_ai_config) used session.flush() without session.commit(). Changes appeared
  to succeed (response reflected in-memory state) but rolled back on session close.
  Fixed by replacing flush() with commit() in all four write handlers.

- auth.js: Concurrent refresh() calls from QuotaBar and App.vue on page reload caused
  a token rotation race — first call rotated the cookie, second arrived with stale
  cookie and cleared accessToken. Fixed by deduplicating with a shared in-flight
  promise (_refreshInFlight).

Phase 3 UAT: 9/10 pass. UAT-3 (QuotaBar visual) pending browser confirmation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 11:30:41 +02:00
curo1305 a5994d9ff4 chore: commit pending phase-3 work and add TEST_ACCOUNTS.md
Includes planning artifacts (03-CONTEXT, 03-DISCUSSION-LOG, 03-02-SUMMARY),
integration test script, MinIO/auth/docker fixes, and local dev account reference.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-24 11:30:56 +02:00
curo1305 3ed6dd494f feat(03-02): extend StorageBackend ABC and MinIOBackend with presigned PUT and stat_object
- Add generate_presigned_put_url and stat_object abstract methods to StorageBackend ABC
- Extend MinIOBackend with dual client (self._client internal + self._public_client public)
- MinIOBackend.__init__ accepts optional public_endpoint param (RESEARCH.md Finding 3)
- generate_presigned_put_url uses self._public_client for browser-resolvable URLs
- stat_object uses self._client.stat_object and returns .size (authoritative, T-03-05)
- get_storage_backend() passes public_endpoint=settings.minio_public_endpoint
- config.py adds minio_public_endpoint field (RESEARCH.md Finding 3)
- docker-compose.yml: MINIO_API_CORS_ALLOW_ORIGIN on minio service (T-03-09)
- docker-compose.yml: MINIO_PUBLIC_ENDPOINT on backend service
- docker-compose.yml: new celery-beat service (RESEARCH.md Finding 10)
2026-05-23 13:52:16 +02:00
curo1305 eaf86a832a feat(01-04): add StorageBackend ABC + MinIOBackend + factory
- backend/storage/base.py: StorageBackend ABC with 5 abstract methods mirroring ai/base.py
- backend/storage/minio_backend.py: MinIOBackend wrapping all sync Minio SDK calls in asyncio.to_thread(); STORE-02 key schema: {user_id}/{document_id}/{uuid4()}{ext}
- backend/storage/__init__.py: get_storage_backend() factory mirroring ai/__init__.py
- backend/tests/test_storage.py: remove xfail markers (plan 04 implements the module)
2026-05-22 09:36:24 +02:00