ThesisBackend / README.md
AdarshRajDS
stable multimodal supabase ingestion milestone
5484978
metadata
title: Multimodal RAG Thesis Backend
emoji: 🧠
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: false

Multimodal RAG Thesis Backend

FastAPI backend for multimodal anatomy RAG system.

MinIO (object storage for image links)

Image URLs in production are served from MinIO so links work even when the app runs on ephemeral or multi-replica setups.

1. Install and run MinIO (no install on your laptop; runs in Docker)

From the project root:

docker compose up -d minio
  • S3 API: http://localhost:9000
  • Web console: http://localhost:9001 (login: minioadmin / minioadmin unless you set MINIO_ROOT_USER / MINIO_ROOT_PASSWORD)

2. Install the Python client

pip install minio

(Or use the project’s requirements.txt, which includes minio.)

3. Environment

Copy .env.example to .env and set at least:

  • GROQ_API_KEY for the LLM
  • MinIO defaults work for local Docker: MINIO_ENDPOINT=localhost:9000, MINIO_ACCESS_KEY=minioadmin, MINIO_SECRET_KEY=minioadmin

To disable MinIO and use local /outputs only, set MINIO_ENABLED=false.

4. Using a MinIO license file (Enterprise / SUBNET)

If you have a MinIO license key file (e.g. from SUBNET):

  1. Save the license content to a file in the project, e.g. minio.license (keep it out of version control).
  2. In .env, set the path:
    MINIO_LICENSE_FILE=./minio.license
    
  3. In docker-compose.yml, uncomment the MinIO license volume so the container gets the file as /minio.license:
    volumes:
      - minio_data:/data
      - ${MINIO_LICENSE_FILE:-./minio.license}:/minio.license:ro
    
  4. Restart MinIO: docker compose up -d minio.

The standard open-source minio/minio image does not require a license; this is only for MinIO Enterprise/SUBNET deployments.

5. Flow

  • Upload PDF → images are extracted, uploaded to MinIO, and indexed.
  • RAG ask → returned image URLs are presigned MinIO links (or local /outputs if MinIO is off).

Supabase Storage — debug upload pipeline

  1. Install the SDK in the same environment as uvicorn: pip install supabase (also in requirements.txt).
  2. Preflight (no PDF): GET /debug/storage
    • Check supabase_import_ok, supabase_client_ok, bucket_exists_in_project, and sample_object_keys.
  3. Upload a PDF: POST /upload-pdf/
    • Response includes pipeline: steps 1–6 with ok and detail for each stage.
    • Step 4 includes extractor_stats (uploads_ok, uploads_failed, failed_object_keys).
  4. Server logs — look for [IMAGE_PIPELINE] lines (steps 1–5).
  5. If the bucket stays empty: fix step 2/4 ok: false and errors in detail before re-uploading.