Update README.md
Browse files
README.md
CHANGED
@@ -11,23 +11,109 @@ tags:
|
|
11 |
license: apache-2.0
|
12 |
---
|
13 |
|
|
|
14 |
|
15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
```python
|
18 |
from transformers import AutoModel, AutoTokenizer
|
19 |
-
from peft import PeftModel
|
20 |
-
from transformers import AutoModelForCausalLM
|
21 |
import torch
|
22 |
|
23 |
-
|
24 |
-
|
|
|
|
|
|
|
|
|
25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
|
|
|
27 |
model = model.eval().cuda()
|
28 |
|
|
|
29 |
image_file = 'hindi.png'
|
30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
|
32 |
-
print(res)
|
33 |
-
```
|
|
|
11 |
license: apache-2.0
|
12 |
---
|
13 |
|
14 |
+
# Nayana OCR(Alpha)
|
15 |
|
16 |
+
Nayana OCR is a state-of-the-art model finetuned for document-level Optical Character Recognition (OCR) across **10 Indian languages**:
|
17 |
+
**Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Odia, Punjabi, Tamil, Telugu**
|
18 |
+
while maintaining exceptional OCR capabilities in **English** and **Chinese**.
|
19 |
+
|
20 |
+
This model is built upon the robust **GOT OCR** base and offers features like advanced multilingual OCR, enhanced document rendering, and seamless GPU utilization.
|
21 |
+
|
22 |
+
We are training a better model with lot more data follows us to keep it update
|
23 |
+
|
24 |
+
for more information : [Cognitivelab](https://cognitivelab.in)
|
25 |
+
|
26 |
+
---
|
27 |
+
|
28 |
+
## Key Features
|
29 |
+
|
30 |
+
- **Multilingual OCR**: Supports OCR for 10 Indian languages alongside English and Chinese.
|
31 |
+
- **Document-Level OCR**: Designed for extracting text from complex document layouts.
|
32 |
+
- **Streamlined Deployment**: Optimized for GPU usage with support for safetensors.
|
33 |
+
- **Customizable OCR Type**: Switch between OCR modes and enable rendering.
|
34 |
+
|
35 |
+
---
|
36 |
+
|
37 |
+
## Installation
|
38 |
+
|
39 |
+
To use Nayana OCR, ensure you have the following prerequisites installed:
|
40 |
+
|
41 |
+
1. Python 3.8+
|
42 |
+
2. PyTorch (with GPU support)
|
43 |
+
3. Transformers library
|
44 |
+
4. PEFT library
|
45 |
+
|
46 |
+
Install the required libraries using:
|
47 |
+
|
48 |
+
```bash
|
49 |
+
pip install torch transformers peft
|
50 |
+
```
|
51 |
+
|
52 |
+
---
|
53 |
+
|
54 |
+
## Usage Example
|
55 |
+
|
56 |
+
Here's a quick example of how to use Nayana OCR for extracting text from an image:
|
57 |
|
58 |
```python
|
59 |
from transformers import AutoModel, AutoTokenizer
|
60 |
+
from peft import PeftModel
|
|
|
61 |
import torch
|
62 |
|
63 |
+
# Load tokenizer and model
|
64 |
+
tokenizer = AutoTokenizer.from_pretrained(
|
65 |
+
'Nayana-cognitivelab/Nayana_base_OCR',
|
66 |
+
trust_remote_code=True,
|
67 |
+
torch_dtype=torch.float16
|
68 |
+
)
|
69 |
|
70 |
+
model = AutoModel.from_pretrained(
|
71 |
+
'Nayana-cognitivelab/Nayana_base_OCR',
|
72 |
+
trust_remote_code=True,
|
73 |
+
low_cpu_mem_usage=True,
|
74 |
+
device_map='cuda',
|
75 |
+
use_safetensors=True,
|
76 |
+
pad_token_id=tokenizer.eos_token_id,
|
77 |
+
torch_dtype=torch.float16
|
78 |
+
)
|
79 |
|
80 |
+
# Prepare the model for inference
|
81 |
model = model.eval().cuda()
|
82 |
|
83 |
+
# Perform OCR on an image
|
84 |
image_file = 'hindi.png'
|
85 |
+
result = model.chat(
|
86 |
+
tokenizer,
|
87 |
+
image_file,
|
88 |
+
ocr_type='ocr',
|
89 |
+
render=True,
|
90 |
+
stream_flag=True
|
91 |
+
)
|
92 |
+
|
93 |
+
print(result)
|
94 |
+
```
|
95 |
+
|
96 |
+
---
|
97 |
+
|
98 |
+
## Parameters
|
99 |
+
|
100 |
+
| Parameter | Description | Default |
|
101 |
+
|--------------|-----------------------------------------------------------------------------|----------|
|
102 |
+
| `ocr_type` | Specify the type of OCR to use (`'ocr'`) | `'ocr'` |
|
103 |
+
| `render` | Enable rendering of the extracted text on the image. | `True` |
|
104 |
+
| `stream_flag`| Stream results for larger or multi-page documents. | `True` |
|
105 |
+
|
106 |
+
---
|
107 |
+
|
108 |
+
## Base Model
|
109 |
+
|
110 |
+
This model is finetuned on the **GOT OCR** base, leveraging its vision-language capabilities to deliver unparalleled OCR performance.
|
111 |
+
|
112 |
+
---
|
113 |
+
|
114 |
+
## License
|
115 |
+
|
116 |
+
This project is licensed under the **Apache 2.0 License**. See the [LICENSE](LICENSE) file for details.
|
117 |
+
|
118 |
+
---
|
119 |
|
|
|
|