--- license: apache-2.0 language: ko tags: - fill-mask - korean - lassl mask_token: "[MASK]" widget: - text: 대한민국의 수도는 [MASK] 입니다. --- # LASSL bert-ko-base ## How to use ```python from transformers import AutoModel, AutoTokenizer model = AutoModel.from_pretrained("lassl/bert-ko-base") tokenizer = AutoTokenizer.from_pretrained("lassl/bert-ko-base") ``` ## Evaluation Evaulation results will be released soon. ## Corpora This model was trained from 702,437 examples (whose have 3,596,465,664 tokens). 702,437 examples are extracted from below corpora. If you want to get information for training, you should see `config.json`. ```bash corpora/ ├── [707M] kowiki_latest.txt ├── [ 26M] modu_dialogue_v1.2.txt ├── [1.3G] modu_news_v1.1.txt ├── [9.7G] modu_news_v2.0.txt ├── [ 15M] modu_np_v1.1.txt ├── [1008M] modu_spoken_v1.2.txt ├── [6.5G] modu_written_v1.0.txt └── [413M] petition.txt ```