bert-ko-small / README.md
seopbo's picture
Update README.md
fac17c6
|
raw
history blame
984 Bytes
---
license: apache-2.0
language: ko
tags:
- fill-mask
- korean
- lassl
mask_token: "[MASK]"
widget:
- text: 대한민국의 수도는 [MASK] 입니다.
---
# LASSL bert-ko-small
## How to use
```python
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("lassl/bert-ko-small")
tokenizer = AutoTokenizer.from_pretrained("lassl/bert-ko-small")
```
## Evaluation
Evaulation results will be released soon.
## Corpora
This model was trained from 702,437 examples (whose have 3,596,465,664 tokens). 702,437 examples are extracted from below corpora. If you want to get information for training, you should see `config.json`.
```bash
corpora/
├── [707M] kowiki_latest.txt
├── [ 26M] modu_dialogue_v1.2.txt
├── [1.3G] modu_news_v1.1.txt
├── [9.7G] modu_news_v2.0.txt
├── [ 15M] modu_np_v1.1.txt
├── [1008M] modu_spoken_v1.2.txt
├── [6.5G] modu_written_v1.0.txt
└── [413M] petition.txt
```