File size: 699 Bytes
68481c6 c8fae72 68481c6 f000713 68481c6 92f2c3e c8fae72 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
---
language:
- en
---
# Bert base model for Korean
## Update
- Update at 2021.11.17 : Add Native Support for BERT Tokenizer (works with AutoTokenizer, pipeline)
---
* 70GB Korean text dataset and 42000 lower-cased subwords are used
* Check the model performance and other language models for Korean in [github](https://github.com/kiyoungkim1/LM-kor)
```python
from transformers import pipeline
pipe = pipeline('text-generation', model='beomi/kykim-gpt3-kor-small_based_on_gpt2')
print(pipe("μλ
νμΈμ! μ€λμ"))
# [{'generated_text': 'μλ
νμΈμ! μ€λμ μ κ° μμ¦ μ¬μ©νκ³ μλ ν΄λ μ§μν°λ₯Ό μκ°ν΄λλ¦¬λ €κ³ ν΄μ! λ°λ‘ μ΄ μ ν!! λ°λ‘ μ΄'}]
``` |