Nayana OCR(Alpha)

Nayana OCR is a state-of-the-art model finetuned for document-level Optical Character Recognition (OCR) across 10 Indian languages:
Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu
while maintaining exceptional OCR capabilities in English and Chinese.

This model is built upon the robust GOT OCR base and offers features like advanced multilingual OCR, enhanced document rendering, and seamless GPU utilization.

We are training a better model with lot more data follows us to keep it update

for more information : Cognitivelab


Key Features

  • Multilingual OCR: Supports OCR for 10 Indian languages alongside English and Chinese.
  • Document-Level OCR: Designed for extracting text from complex document layouts.
  • Streamlined Deployment: Optimized for GPU usage with support for safetensors.
  • Customizable OCR Type: Switch between OCR modes and enable rendering.

Installation

To use Nayana OCR, ensure you have the following prerequisites installed:

  1. Python 3.8+
  2. PyTorch (with GPU support)
  3. Transformers library
  4. PEFT library

Install the required libraries using:

pip install torch transformers peft

Usage Example

Here's a quick example of how to use Nayana OCR for extracting text from an image:

from transformers import AutoModel, AutoTokenizer
from peft import PeftModel
import torch

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(
    'Nayana-cognitivelab/Nayana_base_OCR', 
    trust_remote_code=True, 
    torch_dtype=torch.float16
)

model = AutoModel.from_pretrained(
    'Nayana-cognitivelab/Nayana_base_OCR', 
    trust_remote_code=True, 
    low_cpu_mem_usage=True, 
    device_map='cuda', 
    use_safetensors=True, 
    pad_token_id=tokenizer.eos_token_id, 
    torch_dtype=torch.float16
)

# Prepare the model for inference
model = model.eval().cuda()

# Perform OCR on an image
image_file = 'hindi.png'
result = model.chat(
    tokenizer, 
    image_file, 
    ocr_type='ocr', 
    render=True, 
    stream_flag=True
)

print(result)

Parameters

Parameter Description Default
ocr_type Specify the type of OCR to use ('ocr') 'ocr'
render Enable rendering of the extracted text on the image. True
stream_flag Stream results for larger or multi-page documents. True

Base Model

This model is finetuned on the GOT OCR base, leveraging its vision-language capabilities to deliver unparalleled OCR performance.


License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.


Downloads last month
5
Safetensors
Model size
561M params
Tensor type
FP16
·
Inference API
Inference API (serverless) does not yet support model repos that contain custom code.