Nayana OCR(Alpha)

Nayana OCR is a state-of-the-art model finetuned for document-level Optical Character Recognition (OCR) across 10 Indian languages:
Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu
while maintaining exceptional OCR capabilities in English and Chinese.

This model is built upon the robust GOT OCR base and offers features like advanced multilingual OCR, enhanced document rendering, and seamless GPU utilization.

We are training a better model with lot more data follows us to keep it update

for more information : Cognitivelab

Key Features

Multilingual OCR: Supports OCR for 10 Indian languages alongside English and Chinese.
Document-Level OCR: Designed for extracting text from complex document layouts.
Streamlined Deployment: Optimized for GPU usage with support for safetensors.
Customizable OCR Type: Switch between OCR modes and enable rendering.

Installation

To use Nayana OCR, ensure you have the following prerequisites installed:

Python 3.8+
PyTorch (with GPU support)
Transformers library
PEFT library

Install the required libraries using:

pip install torch transformers peft

Usage Example

Here's a quick example of how to use Nayana OCR for extracting text from an image:

from transformers import AutoModel, AutoTokenizer
from peft import PeftModel
import torch

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(
    'Nayana-cognitivelab/Nayana_base_OCR', 
    trust_remote_code=True, 
    torch_dtype=torch.float16
)

model = AutoModel.from_pretrained(
    'Nayana-cognitivelab/Nayana_base_OCR', 
    trust_remote_code=True, 
    low_cpu_mem_usage=True, 
    device_map='cuda', 
    use_safetensors=True, 
    pad_token_id=tokenizer.eos_token_id, 
    torch_dtype=torch.float16
)

# Prepare the model for inference
model = model.eval().cuda()

# Perform OCR on an image
image_file = 'hindi.png'
result = model.chat(
    tokenizer, 
    image_file, 
    ocr_type='ocr', 
    render=True, 
    stream_flag=True
)

print(result)

Parameters

Parameter	Description	Default
`ocr_type`	Specify the type of OCR to use (`'ocr'`)	`'ocr'`
`render`	Enable rendering of the extracted text on the image.	`True`
`stream_flag`	Stream results for larger or multi-page documents.	`True`

Base Model

This model is finetuned on the GOT OCR base, leveraging its vision-language capabilities to deliver unparalleled OCR performance.

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.