File size: 699 Bytes
68481c6
c8fae72
 
68481c6
 
 
 
f000713
 
 
 
 
 
68481c6
 
 
 
92f2c3e
 
 
 
 
c8fae72
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
---
language:
- en
---

# Bert base model for Korean

## Update

- Update at 2021.11.17 : Add Native Support for BERT Tokenizer (works with AutoTokenizer, pipeline)

---

* 70GB Korean text dataset and 42000 lower-cased subwords are used
* Check the model performance and other language models for Korean in [github](https://github.com/kiyoungkim1/LM-kor)

```python
from transformers import pipeline

pipe = pipeline('text-generation', model='beomi/kykim-gpt3-kor-small_based_on_gpt2')
print(pipe("μ•ˆλ…•ν•˜μ„Έμš”! μ˜€λŠ˜μ€"))
# [{'generated_text': 'μ•ˆλ…•ν•˜μ„Έμš”! μ˜€λŠ˜μ€ μ œκ°€ μš”μ¦˜ μ‚¬μš©ν•˜κ³  μžˆλŠ” ν΄λ Œμ§•μ›Œν„°λ₯Ό μ†Œκ°œν•΄λ“œλ¦¬λ €κ³  ν•΄μš”! λ°”λ‘œ 이 μ œν’ˆ!! λ°”λ‘œ 이'}]
```