OCR-TextVision / README.md
Prabhjotschugh's picture
Update README.md
7f48d7f verified

A newer version of the Gradio SDK is available: 5.7.0

Upgrade
metadata
title: OCR TextVision
emoji: πŸ‘
colorFrom: indigo
colorTo: red
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
short_description: A web-based Optical Character Recognition (OCR) application

TextVision OCR Project

Overview

TextVision is a web-based Optical Character Recognition (OCR) application that extracts text from images containing both Hindi and English text. The application also supports keyword search functionality to highlight specific words within the extracted text. It is built with a simple, intuitive interface and supports five languages: Hindi, English, Spanish, French, and Punjabi.

This project was developed as part of an assignment for a job application.

Live Demo: TextVision on Hugging Face Spaces

Features

  • Image Upload: Upload an image and extract text using OCR.
  • Multi-language Support: Supports Hindi, English, Spanish, French, and Punjabi.
  • Keyword Search: Search for specific keywords in the extracted text, with results highlighted if the keyword is found.
  • User-friendly Interface: Built using Gradio, offering an intuitive and simple user experience.

Technology Stack

  • Python 3.9+
  • Gradio 3.50.2 for the web interface
  • PyTesseract 0.3.10 for OCR functionality
  • Pillow 10.0.1 for image processing
  • Tesseract OCR 5.3.1 as the OCR engine

Setup and Installation (Windows)

Installation Steps

  1. Clone or Download the Repository:

    • Using Git:
      git clone https://huggingface.co/spaces/Prabhjotschugh/OCR-TextVision
      cd OCR-TextVision
      
    • Alternatively, download the ZIP from Hugging Face and extract it.
  2. Set up a Virtual Environment (recommended):

    python -m venv venv
    venv\Scripts\activate
    
  3. Install Required Python Packages:

    pip install -r requirements.txt
    
  4. Install Tesseract OCR:

    • Download the Tesseract installer from UB Mannheim.
    • Install it and note the installation path (default: C:\Program Files\Tesseract-OCR).
    • Add Tesseract to your system PATH:
      • Search for "Environment Variables" in the Start menu.
      • Under "System variables", find "Path", click "Edit", and add the Tesseract installation path.
  5. Install Language Data for Tesseract:

    • Download language data files for Hindi (hin), Spanish (spa), French (fra), and Punjabi (pan) from Tesseract GitHub.
    • Place them in the tessdata folder of your Tesseract installation directory.
  6. Configure the Application:

    • Open app.py in a text editor.
    • Find the line:
      pytesseract.pytesseract.tesseract_cmd = '/usr/bin/tesseract'
      
    • Replace it with:
      pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
      

Running the Application

  1. Open a command prompt in the project directory.

  2. Activate the virtual environment (if using one):

    venv\Scripts\activate
    
  3. Start the application:

    python app.py
    
  4. Open a web browser and navigate to http://localhost:7860 to access the web interface.

Usage Instructions

  1. Upload an Image:

    • Click on the image upload area or drag and drop an image file.
    • Supported formats: JPEG, PNG, and other common image formats.
  2. Extract Text:

    • After uploading the image, click "Extract Text."
    • The extracted text will be displayed in the output area.
  3. Keyword Search:

    • Check the "Do you want to search for a keyword?" box.
    • Enter a keyword, then click "Search Keyword."
    • The keyword, if found, will be highlighted in the extracted text.
  4. Clear Results:

    • Click "Clear" to reset the interface and upload a new image.

Example Outputs

image/png

image/png

image/png

image/png

image/png

image/png