File size: 1,018 Bytes
57e0481
 
0746ef0
 
 
 
 
 
 
 
57e0481
0746ef0
 
51ecc13
 
 
 
 
 
 
 
 
 
 
b34d35d
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---

license: apache-2.0
language: ko
tags:
  - fill-mask
  - korean
  - lassl
mask_token: "[MASK]"
widget:
  - text: 대한민국의 수도는 [MASK] 입니다.
---


# LASSL bert-ko-base
## How to use
```python

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("lassl/bert-ko-base")

tokenizer = AutoTokenizer.from_pretrained("lassl/bert-ko-base")

```

## Evaluation
Evaulation results will be released soon.

## Corpora
This model was trained from 702,437 examples (whose have 3,596,465,664 tokens). 702,437 examples are extracted from below corpora. If you want to get information for training, you should see `config.json`.  

```bash

corpora/

├── [707M]  kowiki_latest.txt

├── [ 26M]  modu_dialogue_v1.2.txt

├── [1.3G]  modu_news_v1.1.txt

├── [9.7G]  modu_news_v2.0.txt

├── [ 15M]  modu_np_v1.1.txt

├── [1008M]  modu_spoken_v1.2.txt

├── [6.5G]  modu_written_v1.0.txt

└── [413M]  petition.txt

```