File size: 3,140 Bytes
1915f22
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---
language: ko

license: apache-2.0
---

# pko-t5-small

pko-t5 λŠ” ν•œκ΅­μ–΄ μ „μš© λ°μ΄ν„°λ‘œ ν•™μŠ΅ν•œ [t5 v1.1 λͺ¨λΈ](https://github.com/google-research/text-to-text-transfer-transformer/blob/84f8bcc14b5f2c03de51bd3587609ba8f6bbd1cd/released_checkpoints.md)μž…λ‹ˆλ‹€.

ν•œκ΅­μ–΄λ₯Ό tokenize ν•˜κΈ° μœ„ν•΄μ„œ sentencepiece λŒ€μ‹  OOV κ°€ μ—†λŠ” BBPE λ₯Ό μ‚¬μš©ν–ˆμœΌλ©° ν•œκ΅­μ–΄ 데이터 (λ‚˜λ¬΄μœ„ν‚€, μœ„ν‚€ν”Όλ””μ•„, λͺ¨λ‘μ˜λ§λ­‰μΉ˜ λ“±..) λ₯Ό T5 의 span corruption task λ₯Ό μ‚¬μš©ν•΄μ„œ unsupervised learning 만 μ μš©ν•˜μ—¬ ν•™μŠ΅μ„ μ§„ν–‰ν–ˆμŠ΅λ‹ˆλ‹€.

pko-t5 λ₯Ό μ‚¬μš©ν•˜μ‹€ λ•ŒλŠ” μ‹€μ œ task 에 νŒŒμΈνŠœλ‹ν•˜μ—¬ μ‚¬μš©ν•˜μ‹œκΈ° λ°”λžλ‹ˆλ‹€.

## Usage
transformers 의 API λ₯Ό μ‚¬μš©ν•˜μ—¬ μ ‘κ·Ό κ°€λŠ₯ν•©λ‹ˆλ‹€. tokenizer λ₯Ό μ‚¬μš©ν• λ•ŒλŠ” `T5Tokenizer` κ°€ μ•„λ‹ˆλΌ `T5TokenizerFast` λ₯Ό μ‚¬μš©ν•΄μ£Όμ‹­μ‹œμ˜€. model 은 T5ForConditionalGeneration λ₯Ό κ·ΈλŒ€λ‘œ ν™œμš©ν•˜μ‹œλ©΄ λ©λ‹ˆλ‹€.

### Example
```python
from transformers import T5TokenizerFast, T5ForConditionalGeneration

tokenizer = T5TokenizerFast.from_pretrained('paust/pko-t5-small')
model = T5ForConditionalGeneration.from_pretrained('paust/pko-t5-small')

input_ids = tokenizer(["nsmc sentence: λ‹Ήμ‹ μ˜ 이름은 λ¬΄μ—‡μΈκ°€μš”?"]).input_ids
labels = tokenizer(["T5 μž…λ‹ˆλ‹€."]).input_ids
outputs = model(input_ids, labels)

print(f"loss={outputs.loss} logits={outputs.logits}")
```
    

## Klue 평가 (dev)

|  | Model | ynat (macro F1) | sts (pearsonr/F1) | nli (acc) | ner (entity-level F1) | re (micro F1) | dp (LAS) | mrc (EM/F1) |
| --- | --- |-----------------| --- | --- | --- | --- | --- | --- |
|  | Baseline | **87.30**       | **93.20/86.13** | **89.50** | 86.06 | 71.06 | 87.93 | 75.26/- |
| FT | pko-t5-small (77M) | 86.21           | 77.99/77.01 | 69.20 | 82.60 | 62.95 | 93.15 | 43.81/46.58 |
| FT | pko-t5-base (250M) | 87.29           | 90.25/83.43 | 79.73 | 87.80 | 72.94 | 97.28 | 61.53/64.74 |
| FT | pko-t5-large (800M) | 87.12           | 92.05/85.24 | 84.96 | **88.18** | 72.26 | 97.60 | 68.01/71.44 |
| MT | pko-t5-small | 85.85           | 79.12/77.81 | 66.8 | 81.53 | 67.93 | 91.38 | 44.97/48.07 |
| MT | pko-t5-base | 86.86           | 87.61/81.42 | 75.46 | 86.85 | 71.85 | 96.32 | 61.95/65.06 |
| MT | pko-t5-large | 87.25           | 91.05/84.58 | 82.16 | 87.63 | **74.78** | **97.33** | **69.18/71.92** |

- FT: μ‹±κΈ€νƒœμŠ€ν¬ νŒŒμΈνŠœλ‹ / MT: λ©€ν‹°νƒœμŠ€ν¬ νŒŒμΈνŠœλ‹
- [Baseline](https://arxiv.org/abs/2105.09680): KLUE λ…Όλ¬Έμ—μ„œ μ†Œκ°œλœ dev set 에 λŒ€ν•œ SOTA 점수

μœ„μ˜ klue ν‰κ°€λŠ” input_ids 의 max lengthλ₯Ό 1300 으둜 ν•˜μ—¬ ν•™μŠ΅ν–ˆμŠ΅λ‹ˆλ‹€. μ΄λ ‡κ²Œ ν•˜λ©΄ encoding 된 context κ°€ train=98.4% / dev=100% 둜 컀버가 λ©λ‹ˆλ‹€.

### MRC에 λŒ€ν•œ μΆ”κ°€ μ‹€ν—˜ 평가

1. max lengthλ₯Ό 512 둜 ν•˜μ—¬ context λ₯Ό μŠ¬λΌμ΄λ”©ν•΄μ„œ ν•™μŠ΅
2. context 이외에 title 을 ν¬ν•¨ν•˜μ—¬ ν•™μŠ΅

|  | (1) EM / F1 | (2) EM / F1 |
| --- | --- | --- |
| small | 42.20/45.03 | 46.85/50.46 |
| base | 57.06/60.20 | 63.12/67.38 |
| large | 61.53/64.94 | 70.15/74.20 |

title 을 ν¬ν•¨ν–ˆμ„ λ•Œ μ„±λŠ₯이 μ’€ 더 μ’‹μ§€λ§Œ 그에 따라 sequence length κ°€ λŠ˜μ–΄λ‚¬μŠ΅λ‹ˆλ‹€.