metadata

title: img-read
emoji: 📚
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Byaldi + Qwen2VL

Overview

The Byaldi + Qwen2VL app is an innovative tool designed for extracting text from images using advanced OCR (Optical Character Recognition) techniques and natural language processing. This application leverages the RAGMultiModalModel from Byaldi and the Qwen2VL model for generating meaningful responses based on the extracted text.

This application also takes advantage of ZeroGPU to run efficiently on powerful hardware, specifically the NVIDIA A100 GPU, ensuring high-speed processing and accurate results even for large and complex image inputs.

Features

Image Upload: Users can upload images from which text will be extracted.
Text Extraction: Utilizes state-of-the-art models to accurately extract text from the uploaded images.
Keyword Search: Allows users to search for specific keywords within the extracted text and highlights them.
High-Performance: Runs on ZeroGPU (NVIDIA A100) for accelerated computation and efficient model execution.
User-Friendly Interface: Built using Gradio for an interactive user experience.

Technologies Used

Gradio: For creating the web interface.
Byaldi RAGMultiModalModel: For indexing and searching images.
Qwen2VL: For generating responses based on visual and textual inputs.
ZeroGPU: For efficient model inference using NVIDIA A100.
PyTorch: For deep learning functionalities.
Pillow: For image handling.

Getting Started

Prerequisites

Python 3.8 or later

Required libraries:

pip install gradio byaldi transformers torch pillow

Installation

Clone the repository:

git clone <repository-url>
cd <repository-directory>

Install the required dependencies using pip.
Run the application:
```
python app.py
```

Using the App

Upload an Image: Click on the "Upload an Image" button to select and upload an image containing text.
Extract Text: Press the "Extract Text" button to process the image and extract any text found.
Search Keywords: Enter keywords in the search box and click "Search" to highlight matching keywords in the extracted text.

Code Overview

The core functionality of the application is encapsulated in the following sections:

OCR and Text Extraction:
- The ocr_and_extract function processes the uploaded image, extracts text, and cleans the output to remove unnecessary labels.
Keyword Highlighting:
- The search_keywords function takes the extracted text and user-defined keywords, highlighting matches within the text for better visibility.

ZeroGPU Integration

The application is powered by ZeroGPU, leveraging the NVIDIA A100 GPU. This ensures:

Faster image processing and text extraction.
Seamless handling of large-scale models like Qwen2VL.
Optimal performance during high computational loads.

Error Handling

The application includes basic error handling to capture and display any issues encountered during image processing. Errors will be printed to the console, and a user-friendly message will be displayed in the interface.

References

Byaldi for providing the RAGMultiModalModel.
Hugging Face Transformers for state-of-the-art models.
ZeroGPU for enabling efficient GPU computation with NVIDIA A100.