--- pipeline_tag: image-text-to-text library_name: transformers language: - multilingual tags: - got - vision-language - ocr2.0 - custom_code license: apache-2.0 --- # Nayana OCR(Alpha) Nayana OCR is a state-of-the-art model finetuned for document-level Optical Character Recognition (OCR) across **10 Indian languages**: **Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu** while maintaining exceptional OCR capabilities in **English** and **Chinese**. This model is built upon the robust **GOT OCR** base and offers features like advanced multilingual OCR, enhanced document rendering, and seamless GPU utilization. We are training a better model with lot more data follows us to keep it update for more information : [Cognitivelab](https://cognitivelab.in) --- ## Key Features - **Multilingual OCR**: Supports OCR for 10 Indian languages alongside English and Chinese. - **Document-Level OCR**: Designed for extracting text from complex document layouts. - **Streamlined Deployment**: Optimized for GPU usage with support for safetensors. - **Customizable OCR Type**: Switch between OCR modes and enable rendering. --- ## Installation To use Nayana OCR, ensure you have the following prerequisites installed: 1. Python 3.8+ 2. PyTorch (with GPU support) 3. Transformers library 4. PEFT library Install the required libraries using: ```bash pip install torch transformers peft ``` --- ## Usage Example Here's a quick example of how to use Nayana OCR for extracting text from an image: ```python from transformers import AutoModel, AutoTokenizer from peft import PeftModel import torch # Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained( 'Nayana-cognitivelab/Nayana_base_OCR', trust_remote_code=True, torch_dtype=torch.float16 ) model = AutoModel.from_pretrained( 'Nayana-cognitivelab/Nayana_base_OCR', trust_remote_code=True, low_cpu_mem_usage=True, device_map='cuda', use_safetensors=True, pad_token_id=tokenizer.eos_token_id, torch_dtype=torch.float16 ) # Prepare the model for inference model = model.eval().cuda() # Perform OCR on an image image_file = 'hindi.png' result = model.chat( tokenizer, image_file, ocr_type='ocr', render=True, stream_flag=True ) print(result) ``` --- ## Parameters | Parameter | Description | Default | |--------------|-----------------------------------------------------------------------------|----------| | `ocr_type` | Specify the type of OCR to use (`'ocr'`) | `'ocr'` | | `render` | Enable rendering of the extracted text on the image. | `True` | | `stream_flag`| Stream results for larger or multi-page documents. | `True` | --- ## Base Model This model is finetuned on the **GOT OCR** base, leveraging its vision-language capabilities to deliver unparalleled OCR performance. --- ## License This project is licensed under the **Apache 2.0 License**. See the [LICENSE](LICENSE) file for details. ---