test-two / README.md
gauri-sharan's picture
Update README.md
08f4db4 verified

A newer version of the Gradio SDK is available: 5.16.0

Upgrade
metadata
title: img-read
emoji: πŸ“š
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Byaldi + Qwen2VL

Alt text

Overview

The Byaldi + Qwen2VL app is an innovative tool designed for extracting text from images using advanced OCR (Optical Character Recognition) techniques and natural language processing. This application leverages the RAGMultiModalModel from Byaldi and the Qwen2VL model for generating meaningful responses based on the extracted text.

This application also takes advantage of ZeroGPU to run efficiently on powerful hardware, specifically the NVIDIA A100 GPU, ensuring high-speed processing and accurate results even for large and complex image inputs.

Features

  • Image Upload: Users can upload images from which text will be extracted.
  • Text Extraction: Utilizes state-of-the-art models to accurately extract text from the uploaded images.
  • Keyword Search: Allows users to search for specific keywords within the extracted text and highlights them.
  • High-Performance: Runs on ZeroGPU (NVIDIA A100) for accelerated computation and efficient model execution.
  • User-Friendly Interface: Built using Gradio for an interactive user experience.

Technologies Used

  • Gradio: For creating the web interface.
  • Byaldi RAGMultiModalModel: For indexing and searching images.
  • Qwen2VL: For generating responses based on visual and textual inputs.
  • ZeroGPU: For efficient model inference using NVIDIA A100.
  • PyTorch: For deep learning functionalities.
  • Pillow: For image handling.

Getting Started

Prerequisites

  • Python 3.8 or later
  • Required libraries:
    pip install gradio byaldi transformers torch pillow
    

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd <repository-directory>
    
  2. Install the required dependencies using pip.

  3. Run the application:

    python app.py
    

Using the App

  1. Upload an Image: Click on the "Upload an Image" button to select and upload an image containing text.
  2. Extract Text: Press the "Extract Text" button to process the image and extract any text found.
  3. Search Keywords: Enter keywords in the search box and click "Search" to highlight matching keywords in the extracted text.

Code Overview

The core functionality of the application is encapsulated in the following sections:

  • OCR and Text Extraction:

    • The ocr_and_extract function processes the uploaded image, extracts text, and cleans the output to remove unnecessary labels.
  • Keyword Highlighting:

    • The search_keywords function takes the extracted text and user-defined keywords, highlighting matches within the text for better visibility.

ZeroGPU Integration

The application is powered by ZeroGPU, leveraging the NVIDIA A100 GPU. This ensures:

  • Faster image processing and text extraction.
  • Seamless handling of large-scale models like Qwen2VL.
  • Optimal performance during high computational loads.

Error Handling

The application includes basic error handling to capture and display any issues encountered during image processing. Errors will be printed to the console, and a user-friendly message will be displayed in the interface.

References