File size: 1,291 Bytes
ed78802 1a1d527 ed78802 9a7f6ea ed78802 9a7f6ea ed78802 cdf55ff ed78802 cdf55ff ed78802 cdf55ff ed78802 cdf55ff |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
---
license: apache-2.0
language: ko
tags:
- korean
- lassl
mask_token: "<mask>"
widget:
- text: 대한민국의 수도는 <mask> 입니다.
---
# LASSL roberta-ko-small
## How to use
```python
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("lassl/roberta-ko-small")
tokenizer = AutoTokenizer.from_pretrained("lassl/roberta-ko-small")
```
## Evaluation
Pretrained `roberta-ko-small` on korean language was trained by [LASSL](https://github.com/lassl/lassl) framework. Below performance was evaluated at 2021/12/15.
| nsmc | klue_nli | klue_sts | korquadv1 | klue_mrc | avg |
| ---- | -------- | -------- | --------- | ---- | -------- |
| 87.8846 | 66.3086 | 83.8353 | 83.1780 | 42.4585 | 72.7330 |
## Corpora
This model was trained from 6,860,062 examples (whose have 3,512,351,744 tokens). 6,860,062 examples are extracted from below corpora. If you want to get information for training, you should see `config.json`.
```bash
corpora/
├── [707M] kowiki_latest.txt
├── [ 26M] modu_dialogue_v1.2.txt
├── [1.3G] modu_news_v1.1.txt
├── [9.7G] modu_news_v2.0.txt
├── [ 15M] modu_np_v1.1.txt
├── [1008M] modu_spoken_v1.2.txt
├── [6.5G] modu_written_v1.0.txt
└── [413M] petition.txt
```
|