File size: 1,731 Bytes
b4a5c42
 
 
 
 
eba9b9d
 
 
 
 
 
 
9da0500
eba9b9d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
language:
- ko
tags:
- ocr
widget:
- src: https://raw.githubusercontent.com/ddobokki/ocr_img_example/master/g.jpg
  example_title: word1
- src: https://raw.githubusercontent.com/ddobokki/ocr_img_example/master/khs.jpg
  example_title: word2
- src: https://raw.githubusercontent.com/ddobokki/ocr_img_example/master/m.jpg
  example_title: word3
pipeline_tag: image-to-text
---

# korean trocr model

## train datasets
AI Hub
- [๋‹ค์–‘ํ•œ ํ˜•ํƒœ์˜ ํ•œ๊ธ€ ๋ฌธ์ž OCR](https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=91)
- [๊ณต๊ณตํ–‰์ •๋ฌธ์„œ OCR](https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=88)

## model structure
- encoder : [trocr-base-stage1's encoder](https://huggingface.co/microsoft/trocr-base-stage1)
- decoder : [KR-BERT-char16424](https://huggingface.co/snunlp/KR-BERT-char16424)

## how to use

```python
from transformers import TrOCRProcessor, VisionEncoderDecoderModel, AutoTokenizer
import requests 
import unicodedata
from io import BytesIO
from PIL import Image

processor = TrOCRProcessor.from_pretrained("ddobokki/ko-trocr") 
model = VisionEncoderDecoderModel.from_pretrained("ddobokki/ko-trocr")
tokenizer = AutoTokenizer.from_pretrained("ddobokki/ko-trocr")

url = "https://raw.githubusercontent.com/ddobokki/ocr_img_example/master/g.jpg"
response = requests.get(url)
img = Image.open(BytesIO(response.content))

pixel_values = processor(img, return_tensors="pt").pixel_values 
generated_ids = model.generate(pixel_values, max_length=64)
generated_text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
generated_text = unicodedata.normalize("NFC", generated_text)
print(generated_text)
```