language: ko | |
# Bert base model for Korean | |
## Update | |
- Update at 2021.11.17 : Add Native Support for BERT Tokenizer (works with AutoTokenizer, pipeline) | |
--- | |
* 70GB Korean text dataset and 42000 lower-cased subwords are used | |
* Check the model performance and other language models for Korean in [github](https://github.com/kiyoungkim1/LM-kor) | |
```python | |
from transformers import pipeline | |
pipe = pipeline('text-generation', model='beomi/kykim-gpt3-kor-small_based_on_gpt2') | |
print(pipe("μλ νμΈμ! μ€λμ")) | |
# [{'generated_text': 'μλ νμΈμ! μ€λμ μ κ° μμ¦ μ¬μ©νκ³ μλ ν΄λ μ§μν°λ₯Ό μκ°ν΄λλ¦¬λ €κ³ ν΄μ! λ°λ‘ μ΄ μ ν!! λ°λ‘ μ΄'}] | |
``` | |