File size: 3,140 Bytes
1915f22 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
---
language: ko
license: apache-2.0
---
# pko-t5-small
pko-t5 λ νκ΅μ΄ μ μ© λ°μ΄ν°λ‘ νμ΅ν [t5 v1.1 λͺ¨λΈ](https://github.com/google-research/text-to-text-transfer-transformer/blob/84f8bcc14b5f2c03de51bd3587609ba8f6bbd1cd/released_checkpoints.md)μ
λλ€.
νκ΅μ΄λ₯Ό tokenize νκΈ° μν΄μ sentencepiece λμ OOV κ° μλ BBPE λ₯Ό μ¬μ©νμΌλ©° νκ΅μ΄ λ°μ΄ν° (λ무μν€, μν€νΌλμ, λͺ¨λμλ§λμΉ λ±..) λ₯Ό T5 μ span corruption task λ₯Ό μ¬μ©ν΄μ unsupervised learning λ§ μ μ©νμ¬ νμ΅μ μ§ννμ΅λλ€.
pko-t5 λ₯Ό μ¬μ©νμ€ λλ μ€μ task μ νμΈνλνμ¬ μ¬μ©νμκΈ° λ°λλλ€.
## Usage
transformers μ API λ₯Ό μ¬μ©νμ¬ μ κ·Ό κ°λ₯ν©λλ€. tokenizer λ₯Ό μ¬μ©ν λλ `T5Tokenizer` κ° μλλΌ `T5TokenizerFast` λ₯Ό μ¬μ©ν΄μ£Όμμμ€. model μ T5ForConditionalGeneration λ₯Ό κ·Έλλ‘ νμ©νμλ©΄ λ©λλ€.
### Example
```python
from transformers import T5TokenizerFast, T5ForConditionalGeneration
tokenizer = T5TokenizerFast.from_pretrained('paust/pko-t5-small')
model = T5ForConditionalGeneration.from_pretrained('paust/pko-t5-small')
input_ids = tokenizer(["nsmc sentence: λΉμ μ μ΄λ¦μ 무μμΈκ°μ?"]).input_ids
labels = tokenizer(["T5 μ
λλ€."]).input_ids
outputs = model(input_ids, labels)
print(f"loss={outputs.loss} logits={outputs.logits}")
```
## Klue νκ° (dev)
| | Model | ynat (macro F1) | sts (pearsonr/F1) | nli (acc) | ner (entity-level F1) | re (micro F1) | dp (LAS) | mrc (EM/F1) |
| --- | --- |-----------------| --- | --- | --- | --- | --- | --- |
| | Baseline | **87.30** | **93.20/86.13** | **89.50** | 86.06 | 71.06 | 87.93 | 75.26/- |
| FT | pko-t5-small (77M) | 86.21 | 77.99/77.01 | 69.20 | 82.60 | 62.95 | 93.15 | 43.81/46.58 |
| FT | pko-t5-base (250M) | 87.29 | 90.25/83.43 | 79.73 | 87.80 | 72.94 | 97.28 | 61.53/64.74 |
| FT | pko-t5-large (800M) | 87.12 | 92.05/85.24 | 84.96 | **88.18** | 72.26 | 97.60 | 68.01/71.44 |
| MT | pko-t5-small | 85.85 | 79.12/77.81 | 66.8 | 81.53 | 67.93 | 91.38 | 44.97/48.07 |
| MT | pko-t5-base | 86.86 | 87.61/81.42 | 75.46 | 86.85 | 71.85 | 96.32 | 61.95/65.06 |
| MT | pko-t5-large | 87.25 | 91.05/84.58 | 82.16 | 87.63 | **74.78** | **97.33** | **69.18/71.92** |
- FT: μ±κΈνμ€ν¬ νμΈνλ / MT: λ©ν°νμ€ν¬ νμΈνλ
- [Baseline](https://arxiv.org/abs/2105.09680): KLUE λ
Όλ¬Έμμ μκ°λ dev set μ λν SOTA μ μ
μμ klue νκ°λ input_ids μ max lengthλ₯Ό 1300 μΌλ‘ νμ¬ νμ΅νμ΅λλ€. μ΄λ κ² νλ©΄ encoding λ context κ° train=98.4% / dev=100% λ‘ μ»€λ²κ° λ©λλ€.
### MRCμ λν μΆκ° μ€ν νκ°
1. max lengthλ₯Ό 512 λ‘ νμ¬ context λ₯Ό μ¬λΌμ΄λ©ν΄μ νμ΅
2. context μ΄μΈμ title μ ν¬ν¨νμ¬ νμ΅
| | (1) EM / F1 | (2) EM / F1 |
| --- | --- | --- |
| small | 42.20/45.03 | 46.85/50.46 |
| base | 57.06/60.20 | 63.12/67.38 |
| large | 61.53/64.94 | 70.15/74.20 |
title μ ν¬ν¨νμ λ μ±λ₯μ΄ μ’ λ μ’μ§λ§ κ·Έμ λ°λΌ sequence length κ° λμ΄λ¬μ΅λλ€.
|