Spaces:
Sleeping
A newer version of the Gradio SDK is available:
5.16.0
title: img-read
emoji: π
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Byaldi + Qwen2VL
Overview
The Byaldi + Qwen2VL app is an innovative tool designed for extracting text from images using advanced OCR (Optical Character Recognition) techniques and natural language processing. This application leverages the RAGMultiModalModel from Byaldi and the Qwen2VL model for generating meaningful responses based on the extracted text.
This application also takes advantage of ZeroGPU to run efficiently on powerful hardware, specifically the NVIDIA A100 GPU, ensuring high-speed processing and accurate results even for large and complex image inputs.
Features
- Image Upload: Users can upload images from which text will be extracted.
- Text Extraction: Utilizes state-of-the-art models to accurately extract text from the uploaded images.
- Keyword Search: Allows users to search for specific keywords within the extracted text and highlights them.
- High-Performance: Runs on ZeroGPU (NVIDIA A100) for accelerated computation and efficient model execution.
- User-Friendly Interface: Built using Gradio for an interactive user experience.
Technologies Used
- Gradio: For creating the web interface.
- Byaldi RAGMultiModalModel: For indexing and searching images.
- Qwen2VL: For generating responses based on visual and textual inputs.
- ZeroGPU: For efficient model inference using NVIDIA A100.
- PyTorch: For deep learning functionalities.
- Pillow: For image handling.
Getting Started
Prerequisites
- Python 3.8 or later
- Required libraries:
pip install gradio byaldi transformers torch pillow
Installation
Clone the repository:
git clone <repository-url> cd <repository-directory>
Install the required dependencies using pip.
Run the application:
python app.py
Using the App
- Upload an Image: Click on the "Upload an Image" button to select and upload an image containing text.
- Extract Text: Press the "Extract Text" button to process the image and extract any text found.
- Search Keywords: Enter keywords in the search box and click "Search" to highlight matching keywords in the extracted text.
Code Overview
The core functionality of the application is encapsulated in the following sections:
OCR and Text Extraction:
- The
ocr_and_extract
function processes the uploaded image, extracts text, and cleans the output to remove unnecessary labels.
- The
Keyword Highlighting:
- The
search_keywords
function takes the extracted text and user-defined keywords, highlighting matches within the text for better visibility.
- The
ZeroGPU Integration
The application is powered by ZeroGPU, leveraging the NVIDIA A100 GPU. This ensures:
- Faster image processing and text extraction.
- Seamless handling of large-scale models like Qwen2VL.
- Optimal performance during high computational loads.
Error Handling
The application includes basic error handling to capture and display any issues encountered during image processing. Errors will be printed to the console, and a user-friendly message will be displayed in the interface.
References
- Byaldi for providing the RAGMultiModalModel.
- Hugging Face Transformers for state-of-the-art models.
- ZeroGPU for enabling efficient GPU computation with NVIDIA A100.