test-two / README.md
gauri-sharan's picture
Update README.md
08f4db4 verified
---
title: img-read
emoji: πŸ“š
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# Byaldi + Qwen2VL
![Alt text](Screenshot680.png)
## Overview
The **Byaldi + Qwen2VL** app is an innovative tool designed for extracting text from images using advanced OCR (Optical Character Recognition) techniques and natural language processing. This application leverages the **RAGMultiModalModel** from Byaldi and the **Qwen2VL** model for generating meaningful responses based on the extracted text.
This application also takes advantage of **ZeroGPU** to run efficiently on powerful hardware, specifically the **NVIDIA A100** GPU, ensuring high-speed processing and accurate results even for large and complex image inputs.
## Features
- **Image Upload**: Users can upload images from which text will be extracted.
- **Text Extraction**: Utilizes state-of-the-art models to accurately extract text from the uploaded images.
- **Keyword Search**: Allows users to search for specific keywords within the extracted text and highlights them.
- **High-Performance**: Runs on **ZeroGPU (NVIDIA A100)** for accelerated computation and efficient model execution.
- **User-Friendly Interface**: Built using Gradio for an interactive user experience.
## Technologies Used
- **Gradio**: For creating the web interface.
- **Byaldi RAGMultiModalModel**: For indexing and searching images.
- **Qwen2VL**: For generating responses based on visual and textual inputs.
- **ZeroGPU**: For efficient model inference using **NVIDIA A100**.
- **PyTorch**: For deep learning functionalities.
- **Pillow**: For image handling.
## Getting Started
### Prerequisites
- Python 3.8 or later
- Required libraries:
```bash
pip install gradio byaldi transformers torch pillow
## Installation
1. Clone the repository:
```bash
git clone <repository-url>
cd <repository-directory>
2. Install the required dependencies using pip.
3. Run the application:
```bash
python app.py
### Using the App
1. **Upload an Image**: Click on the "Upload an Image" button to select and upload an image containing text.
2. **Extract Text**: Press the "Extract Text" button to process the image and extract any text found.
3. **Search Keywords**: Enter keywords in the search box and click "Search" to highlight matching keywords in the extracted text.
## Code Overview
The core functionality of the application is encapsulated in the following sections:
- **OCR and Text Extraction**:
- The `ocr_and_extract` function processes the uploaded image, extracts text, and cleans the output to remove unnecessary labels.
- **Keyword Highlighting**:
- The `search_keywords` function takes the extracted text and user-defined keywords, highlighting matches within the text for better visibility.
## ZeroGPU Integration
The application is powered by **ZeroGPU**, leveraging the **NVIDIA A100** GPU. This ensures:
- Faster image processing and text extraction.
- Seamless handling of large-scale models like Qwen2VL.
- Optimal performance during high computational loads.
## Error Handling
The application includes basic error handling to capture and display any issues encountered during image processing. Errors will be printed to the console, and a user-friendly message will be displayed in the interface.
## References
- [Byaldi](https://huggingface.co/vidore/colpali) for providing the RAGMultiModalModel.
- [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) for state-of-the-art models.
- [ZeroGPU](https://www.zerogpu.com) for enabling efficient GPU computation with NVIDIA A100.