File size: 1,390 Bytes
45ba02e
 
304106c
33e0a37
304106c
 
33e0a37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
license: apache-2.0
language:
  - ko
pipeline_tag: image-to-text
tags:
  - trocr
  - vision-encoder-decoder
---

# trocr-small-korean

## Model Details

TrOCR์€ Encoder-Decoder ๋ชจ๋ธ๋กœ, ์ด๋ฏธ์ง€ ํŠธ๋žœ์Šคํฌ๋จธ ์ธ์ฝ”๋”์™€ ํ…์ŠคํŠธ ํŠธ๋žœ์Šคํฌ๋จธ ๋””์ฝ”๋”๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.
์ด๋ฏธ์ง€ ์ธ์ฝ”๋”๋Š” DeiT ๊ฐ€์ค‘์น˜๋กœ ์ดˆ๊ธฐํ™”๋˜์—ˆ๊ณ , ํ…์ŠคํŠธ ๋””์ฝ”๋”๋Š” ์ž์ฒด์ ์œผ๋กœ ํ•™์Šตํ•œ RoBERTa ๊ฐ€์ค‘์น˜๋กœ ์ดˆ๊ธฐํ™”๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

์ด ์—ฐ๊ตฌ๋Š” ๊ตฌ๊ธ€์˜ TPU Research Cloud(TRC)๋ฅผ ํ†ตํ•ด ์ง€์›๋ฐ›์€ Cloud TPU๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

## How to Get Started with the Model

```python
import torch

from transformers import VisionEncoderDecoderModel

model = VisionEncoderDecoderModel.from_pretrained("team-lucid/trocr-small-korean")

pixel_values = torch.rand(1, 3, 384, 384)
generated_ids = model.generate(pixel_values)
```

## Training Details
### Training Data

ํ•ด๋‹น ๋ชจ๋ธ์€ [synthtiger](https://github.com/clovaai/synthtiger)๋กœ ํ•ฉ์„ฑ๋œ 6M๊ฐœ์˜ ์ด๋ฏธ์ง€๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค

### Training Hyperparameters

| Hyperparameter      |   Small |
|:--------------------|--------:|
| Warmup Steps        |   4,000 |
| Learning Rates      |    1e-4 |
| Batch Size          |     512 |
| Weight Decay        |    0.01 |
| Max Steps           | 500,000 |
| Learning Rate Decay |     0.1 |
| \\(Adam\beta_1\\)   |     0.9 |
| \\(Adam\beta_2\\)   |    0.98 |