Spaces:
Running
Running
File size: 5,614 Bytes
4962477 19a6fbb |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
---
title: LoomRAG
emoji: π
colorFrom: indigo
colorTo: pink
sdk: streamlit
sdk_version: 1.41.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: π§ Multimodal RAG that "weaves" together text and images πͺ‘
---
# π LoomRAG: Multimodal Retrieval-Augmented Generation for AI-Powered Search








<a href="https://loomrag.streamlit.app/"><img src="https://img.shields.io/badge/Streamlit%20App-red?style=flat-rounded-square&logo=streamlit&labelColor=white"/></a>
This project implements a Multimodal Retrieval-Augmented Generation (RAG) system, named **LoomRAG**, that leverages OpenAI's CLIP model for neural cross-modal retrieval and semantic search. The system allows users to input text queries and retrieve both text and image responses seamlessly through vector embeddings. It also supports uploading images and PDFs for enhanced interaction and intelligent retrieval capabilities through a Streamlit-based interface.
Experience the project in action:
[](https://loomrag.streamlit.app/)
---
## πΈ Implementation Screenshots
|  |  |
| ---------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------- |
| Screenshot 1 | Screenshot 2 |
---
## β¨ Features
- π **Cross-Modal Retrieval**: Search text to retrieve both text and image results using deep learning
- π **Streamlit Interface**: Provides a user-friendly web interface for interacting with the system
- π€ **Upload Options**: Allows users to upload images and PDFs for AI-powered processing and retrieval
- π§ **Embedding-Based Search**: Uses OpenAI's CLIP model to align text and image embeddings in a shared latent space
- π **Augmented Text Generation**: Enhances text results using LLMs for contextually rich outputs
---
## ποΈ Architecture Overview
1. **Data Indexing**:
- Text, images, and PDFs are preprocessed and embedded using the CLIP model
- Embeddings are stored in a vector database for fast and efficient retrieval
2. **Query Processing**:
- Text queries are converted into embeddings for semantic search
- Uploaded images and PDFs are processed and embedded for comparison
- The system performs a nearest neighbor search in the vector database to retrieve relevant text and images
3. **Response Generation**:
- For text results: Optionally refined or augmented using a language model
- For image results: Directly returned or enhanced with image captions
- For PDFs: Extracts text content and provides relevant sections
---
## π Installation
1. Clone the repository:
```bash
git clone https://github.com/NotShrirang/LoomRAG.git
cd LoomRAG
```
2. Create a virtual environment and install dependencies:
```bash
pip install -r requirements.txt
```
---
## π Usage
1. **Running the Streamlit Interface**:
- Start the Streamlit app:
```bash
streamlit run app.py
```
- Access the interface in your browser to:
- Submit natural language queries
- Upload images or PDFs to retrieve contextually relevant results
2. **Example Queries**:
- **Text Query**: "sunset over mountains"
Output: An image of a sunset over mountains along with descriptive text
- **PDF Upload**: Upload a PDF of a scientific paper
Output: Extracted key sections or contextually relevant images
---
## βοΈ Configuration
- π **Vector Database**: It uses FAISS for efficient similarity search
- π€ **Model**: Uses OpenAI CLIP for neural embedding generation
- βοΈ **Augmentation**: Optional LLM-based augmentation for text responses
---
## πΊοΈ Roadmap
- [ ] Fine-tuning CLIP for domain-specific datasets
- [ ] Adding support for audio and video modalities
- [ ] Improving the re-ranking system for better contextual relevance
- [ ] Enhanced PDF parsing with semantic section segmentation
---
## π€ Contributing
Contributions are welcome! Please open an issue or submit a pull request for any feature requests or bug fixes.
---
## π License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
---
## π Acknowledgments
- [OpenAI CLIP](https://openai.com/research/clip)
- [FAISS](https://github.com/facebookresearch/faiss)
- [Hugging Face](https://huggingface.co/)
|