AdithyaSK commited on
Commit
72cc6f1
1 Parent(s): a7b14cd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -8
README.md CHANGED
@@ -11,23 +11,109 @@ tags:
11
  license: apache-2.0
12
  ---
13
 
 
14
 
15
- Nayana_base_combined_v1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  ```python
18
  from transformers import AutoModel, AutoTokenizer
19
- from peft import PeftModel, PeftConfig, AutoPeftModelForCausalLM
20
- from transformers import AutoModelForCausalLM
21
  import torch
22
 
23
- tokenizer = AutoTokenizer.from_pretrained('v1v1d/Nayana_base_combined_lora_64', trust_remote_code=True , torch_dtype=torch.float16)
24
- model = AutoModel.from_pretrained('v1v1d/Nayana_base_combined_lora_64', trust_remote_code=True, low_cpu_mem_usage=True, device_map='cuda', use_safetensors=True, pad_token_id=tokenizer.eos_token_id , torch_dtype=torch.float16)
 
 
 
 
25
 
 
 
 
 
 
 
 
 
 
26
 
 
27
  model = model.eval().cuda()
28
 
 
29
  image_file = 'hindi.png'
30
- res = model.chat(tokenizer, image_file, ocr_type='ocr' , render=True, stream_flag = True)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
- print(res)
33
- ```
 
11
  license: apache-2.0
12
  ---
13
 
14
+ # Nayana OCR(Alpha)
15
 
16
+ Nayana OCR is a state-of-the-art model finetuned for document-level Optical Character Recognition (OCR) across **10 Indian languages**:
17
+ **Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu**
18
+ while maintaining exceptional OCR capabilities in **English** and **Chinese**.
19
+
20
+ This model is built upon the robust **GOT OCR** base and offers features like advanced multilingual OCR, enhanced document rendering, and seamless GPU utilization.
21
+
22
+ We are training a better model with lot more data follows us to keep it update
23
+
24
+ for more information : [Cognitivelab](https://cognitivelab.in)
25
+
26
+ ---
27
+
28
+ ## Key Features
29
+
30
+ - **Multilingual OCR**: Supports OCR for 10 Indian languages alongside English and Chinese.
31
+ - **Document-Level OCR**: Designed for extracting text from complex document layouts.
32
+ - **Streamlined Deployment**: Optimized for GPU usage with support for safetensors.
33
+ - **Customizable OCR Type**: Switch between OCR modes and enable rendering.
34
+
35
+ ---
36
+
37
+ ## Installation
38
+
39
+ To use Nayana OCR, ensure you have the following prerequisites installed:
40
+
41
+ 1. Python 3.8+
42
+ 2. PyTorch (with GPU support)
43
+ 3. Transformers library
44
+ 4. PEFT library
45
+
46
+ Install the required libraries using:
47
+
48
+ ```bash
49
+ pip install torch transformers peft
50
+ ```
51
+
52
+ ---
53
+
54
+ ## Usage Example
55
+
56
+ Here's a quick example of how to use Nayana OCR for extracting text from an image:
57
 
58
  ```python
59
  from transformers import AutoModel, AutoTokenizer
60
+ from peft import PeftModel
 
61
  import torch
62
 
63
+ # Load tokenizer and model
64
+ tokenizer = AutoTokenizer.from_pretrained(
65
+ 'Nayana-cognitivelab/Nayana_base_OCR',
66
+ trust_remote_code=True,
67
+ torch_dtype=torch.float16
68
+ )
69
 
70
+ model = AutoModel.from_pretrained(
71
+ 'Nayana-cognitivelab/Nayana_base_OCR',
72
+ trust_remote_code=True,
73
+ low_cpu_mem_usage=True,
74
+ device_map='cuda',
75
+ use_safetensors=True,
76
+ pad_token_id=tokenizer.eos_token_id,
77
+ torch_dtype=torch.float16
78
+ )
79
 
80
+ # Prepare the model for inference
81
  model = model.eval().cuda()
82
 
83
+ # Perform OCR on an image
84
  image_file = 'hindi.png'
85
+ result = model.chat(
86
+ tokenizer,
87
+ image_file,
88
+ ocr_type='ocr',
89
+ render=True,
90
+ stream_flag=True
91
+ )
92
+
93
+ print(result)
94
+ ```
95
+
96
+ ---
97
+
98
+ ## Parameters
99
+
100
+ | Parameter | Description | Default |
101
+ |--------------|-----------------------------------------------------------------------------|----------|
102
+ | `ocr_type` | Specify the type of OCR to use (`'ocr'`) | `'ocr'` |
103
+ | `render` | Enable rendering of the extracted text on the image. | `True` |
104
+ | `stream_flag`| Stream results for larger or multi-page documents. | `True` |
105
+
106
+ ---
107
+
108
+ ## Base Model
109
+
110
+ This model is finetuned on the **GOT OCR** base, leveraging its vision-language capabilities to deliver unparalleled OCR performance.
111
+
112
+ ---
113
+
114
+ ## License
115
+
116
+ This project is licensed under the **Apache 2.0 License**. See the [LICENSE](LICENSE) file for details.
117
+
118
+ ---
119