roberta-ko-small / README.md
seopbo's picture
Update README.md
cdf55ff
metadata
license: apache-2.0
language: ko
tags:
  - korean
  - lassl
mask_token: <mask>
widget:
  - text: 대한민국의 수도는 <mask> 입니다.

LASSL roberta-ko-small

How to use

from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("lassl/roberta-ko-small")
tokenizer = AutoTokenizer.from_pretrained("lassl/roberta-ko-small")

Evaluation

Pretrained roberta-ko-small on korean language was trained by LASSL framework. Below performance was evaluated at 2021/12/15.

nsmc klue_nli klue_sts korquadv1 klue_mrc avg
87.8846 66.3086 83.8353 83.1780 42.4585 72.7330

Corpora

This model was trained from 6,860,062 examples (whose have 3,512,351,744 tokens). 6,860,062 examples are extracted from below corpora. If you want to get information for training, you should see config.json.

corpora/
├── [707M]  kowiki_latest.txt
├── [ 26M]  modu_dialogue_v1.2.txt
├── [1.3G]  modu_news_v1.1.txt
├── [9.7G]  modu_news_v2.0.txt
├── [ 15M]  modu_np_v1.1.txt
├── [1008M]  modu_spoken_v1.2.txt
├── [6.5G]  modu_written_v1.0.txt
└── [413M]  petition.txt