Spaces:

AdarshDS
/

ThesisBackend

Sleeping

AdarshRajDS

stable multimodal supabase ingestion milestone

5484978 25 days ago

2.78 kB

title: Multimodal RAG Thesis Backend
emoji: 🧠
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: false

Multimodal RAG Thesis Backend

FastAPI backend for multimodal anatomy RAG system.

Image URLs in production are served from MinIO so links work even when the app runs on ephemeral or multi-replica setups.

From the project root:

docker compose up -d minio

S3 API: http://localhost:9000
Web console: http://localhost:9001 (login: minioadmin / minioadmin unless you set MINIO_ROOT_USER / MINIO_ROOT_PASSWORD)

pip install minio

(Or use the project’s requirements.txt, which includes minio.)

Copy .env.example to .env and set at least:

GROQ_API_KEY for the LLM
MinIO defaults work for local Docker: MINIO_ENDPOINT=localhost:9000, MINIO_ACCESS_KEY=minioadmin, MINIO_SECRET_KEY=minioadmin

To disable MinIO and use local /outputs only, set MINIO_ENABLED=false.

If you have a MinIO license key file (e.g. from SUBNET):

Save the license content to a file in the project, e.g. minio.license (keep it out of version control).
In .env, set the path:
```
MINIO_LICENSE_FILE=./minio.license
```
In docker-compose.yml, uncomment the MinIO license volume so the container gets the file as /minio.license:
```
volumes:
  - minio_data:/data
  - ${MINIO_LICENSE_FILE:-./minio.license}:/minio.license:ro
```
Restart MinIO: docker compose up -d minio.

The standard open-source minio/minio image does not require a license; this is only for MinIO Enterprise/SUBNET deployments.

Upload PDF → images are extracted, uploaded to MinIO, and indexed.
RAG ask → returned image URLs are presigned MinIO links (or local /outputs if MinIO is off).

Install the SDK in the same environment as uvicorn: pip install supabase (also in requirements.txt).
Preflight (no PDF): GET /debug/storage
- Check supabase_import_ok, supabase_client_ok, bucket_exists_in_project, and sample_object_keys.
Upload a PDF: POST /upload-pdf/
- Response includes pipeline: steps 1–6 with ok and detail for each stage.
- Step 4 includes extractor_stats (uploads_ok, uploads_failed, failed_object_keys).
Server logs — look for [IMAGE_PIPELINE] lines (steps 1–5).
If the bucket stays empty: fix step 2/4 ok: false and errors in detail before re-uploading.