|
--- |
|
language: |
|
- en |
|
--- |
|
|
|
# Bert base model for Korean |
|
|
|
## Update |
|
|
|
- Update at 2021.11.17 : Add Native Support for BERT Tokenizer (works with AutoTokenizer, pipeline) |
|
|
|
--- |
|
|
|
* 70GB Korean text dataset and 42000 lower-cased subwords are used |
|
* Check the model performance and other language models for Korean in [github](https://github.com/kiyoungkim1/LM-kor) |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
pipe = pipeline('text-generation', model='beomi/kykim-gpt3-kor-small_based_on_gpt2') |
|
print(pipe("μλ
νμΈμ! μ€λμ")) |
|
# [{'generated_text': 'μλ
νμΈμ! μ€λμ μ κ° μμ¦ μ¬μ©νκ³ μλ ν΄λ μ§μν°λ₯Ό μκ°ν΄λλ¦¬λ €κ³ ν΄μ! λ°λ‘ μ΄ μ ν!! λ°λ‘ μ΄'}] |
|
``` |