LoomRAG / README.md
NotShrirang's picture
feat: add more sources for data
a409078
metadata
title: LoomRAG
emoji: πŸ†
colorFrom: indigo
colorTo: pink
sdk: streamlit
sdk_version: 1.41.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: 🧠 Multimodal RAG that "weaves" together text and images πŸͺ‘

🌟 LoomRAG: Multimodal Retrieval-Augmented Generation for AI-Powered Search

GitHub stars GitHub forks GitHub commits GitHub issues GitHub pull requests GitHub GitHub last commit GitHub repo size

This project implements a Multimodal Retrieval-Augmented Generation (RAG) system, named LoomRAG, that leverages OpenAI's CLIP model for neural cross-modal retrieval and semantic search. The system allows users to input text queries and retrieve both text and image responses seamlessly through vector embeddings. It features a comprehensive annotation interface for creating custom datasets and supports CLIP model fine-tuning with configurable parameters for domain-specific applications. The system also supports uploading images and PDFs for enhanced interaction and intelligent retrieval capabilities through a Streamlit-based interface.

Experience the project in action:

LoomRAG Streamlit App


πŸ“Έ Implementation Screenshots

Screenshot 2025-01-01 184852 Screenshot 2025-01-01 222334
Data Upload Page Data Search / Retrieval
Screenshot 2025-01-01 222412 Screenshot 2025-01-01 223948
Data Annotation Page CLIP Fine-Tuning

✨ Features

  • πŸ”„ Cross-Modal Retrieval: Search text to retrieve both text and image results using deep learning
  • 🌐 Streamlit Interface: Provides a user-friendly web interface for interacting with the system
  • πŸ“€ Upload Options: Allows users to upload images and PDFs for AI-powered processing and retrieval
  • 🧠 Embedding-Based Search: Uses OpenAI's CLIP model to align text and image embeddings in a shared latent space
  • πŸ” Augmented Text Generation: Enhances text results using LLMs for contextually rich outputs
  • 🏷️ Image Annotation: Enables users to annotate uploaded images through an intuitive interface
  • 🎯 CLIP Fine-Tuning: Supports custom model training with configurable parameters including test dataset split size, learning rate, optimizer, and weight decay
  • πŸ”¨ Fine-Tuned Model Integration: Seamlessly load and utilize fine-tuned CLIP models for enhanced search and retrieval

πŸ—οΈ Architecture Overview

  1. Data Indexing:

    • Text, images, and PDFs are preprocessed and embedded using the CLIP model
    • Embeddings are stored in a vector database for fast and efficient retrieval
  2. Query Processing:

    • Text queries are converted into embeddings for semantic search
    • Uploaded images and PDFs are processed and embedded for comparison
    • The system performs a nearest neighbor search in the vector database to retrieve relevant text and images
  3. Response Generation:

    • For text results: Optionally refined or augmented using a language model
    • For image results: Directly returned or enhanced with image captions
    • For PDFs: Extracts text content and provides relevant sections
  4. Image Annotation:

    • Dedicated annotation page for managing uploaded images
    • Support for creating and managing multiple datasets simultaneously
    • Flexible annotation workflow for efficient data labeling
    • Dataset organization and management capabilities
  5. Model Fine-Tuning:

    • Custom CLIP model training on annotated images
    • Configurable training parameters for optimization
    • Integration of fine-tuned models into the search pipeline

πŸš€ Installation

  1. Clone the repository:

    git clone https://github.com/NotShrirang/LoomRAG.git
    cd LoomRAG
    
  2. Create a virtual environment and install dependencies:

    pip install -r requirements.txt
    

πŸ“– Usage

  1. Running the Streamlit Interface:

    • Start the Streamlit app:

      streamlit run app.py
      
    • Access the interface in your browser to:

      • Submit natural language queries
      • Upload images or PDFs to retrieve contextually relevant results
      • Annotate uploaded images
      • Fine-tune CLIP models with custom parameters
      • Use fine-tuned models for improved search results
  2. Example Queries:

    • Text Query: "sunset over mountains"
      Output: An image of a sunset over mountains along with descriptive text
    • PDF Upload: Upload a PDF of a scientific paper
      Output: Extracted key sections or contextually relevant images

βš™οΈ Configuration

  • πŸ“Š Vector Database: It uses FAISS for efficient similarity search
  • πŸ€– Model: Uses OpenAI CLIP for neural embedding generation
  • ✍️ Augmentation: Optional LLM-based augmentation for text responses
  • πŸŽ›οΈ Fine-Tuning: Configurable parameters for model training and optimization

πŸ—ΊοΈ Roadmap

  • Fine-tuning CLIP for domain-specific datasets
  • Adding support for audio and video modalities
  • Improving the re-ranking system for better contextual relevance
  • Enhanced PDF parsing with semantic section segmentation

🀝 Contributing

Contributions are welcome! Please open an issue or submit a pull request for any feature requests or bug fixes.


πŸ“„ License

This project is licensed under the Apache-2.0 License. See the LICENSE file for details.


πŸ™ Acknowledgments