kogpt / README.md
clint-b's picture
Update README.md
e06d75a verified
---
license: cc-by-nc-nd-4.0
language: ko
tags:
- KakaoBrain
- KoGPT
- GPT
- GPT3
---
# KakaoBrain project KoGPT
KakaoBrain's Pre-Trained Language Models.
* KakaoBrain project KoGPT (Korean Generative Pre-trained Transformer)
* [https://github.com/kakaobrain/kogpt](https://github.com/kakaobrain/kogpt)
* [https://huggingface.co/kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)
## Model Descriptions
### KoGPT6B-ryan1.5b
* [\[huggingface\]\[kakaobrain/kogpt\]\[KoGPT6B-ryan1.5b\]](https://huggingface.co/kakaobrain/kogpt/tree/KoGPT6B-ryan1.5b)
* [\[huggingface\]\[kakaobrain/kogpt\]\[KoGPT6B-ryan1.5b-float16\]](https://huggingface.co/kakaobrain/kogpt/tree/KoGPT6B-ryan1.5b-float16)
| Hyperparameter | Value |
|:---------------------|--------------:|
| \\(n_{parameters}\\) | 6,166,502,400 |
| \\(n_{layers}\\) | 28 |
| \\(d_{model}\\) | 4,096 |
| \\(d_{ff}\\) | 16,384 |
| \\(n_{heads}\\) | 16 |
| \\(d_{head}\\) | 256 |
| \\(n_{ctx}\\) | 2,048 |
| \\(n_{vocab}\\) | 64,512 |
| Positional Encoding | [Rotary Position Embedding (RoPE)](https://arxiv.org/abs/2104.09864) |
| RoPE Dimensions | 64 |
## Hardware requirements
### KoGPT6B-ryan1.5b
#### GPU
The following is the recommended minimum GPU hardware guidance for a handful of example KoGPT.
* `32GB GPU RAM` in the required minimum memory size
### KoGPT6B-ryan1.5b-float16
#### GPU
The following is the recommended minimum GPU hardware guidance for a handful of example KoGPT.
* half-precision requires NVIDIA GPUS based on Volta, Turing or Ampere
* `16GB GPU RAM` in the required minimum memory size
## Usage
### prompt
```bash
python -m kogpt --help
usage: KoGPT inference [-h] [--model MODEL] [--revision {KoGPT6B-ryan1.5b}]
[--device {cpu,cuda}] [-d]
KakaoBrain Korean(hangul) Generative Pre-Training Model
optional arguments:
-h, --help show this help message and exit
--model MODEL huggingface repo (default:kakaobrain/kogpt)
--revision {KoGPT6B-ryan1.5b}
--device {cpu,cuda} (default:cuda)
-d, --debug
```
```bash
python -m kogpt
prompt> μΈκ°„μ²˜λŸΌ μƒκ°ν•˜κ³ , ν–‰λ™ν•˜λŠ” '지λŠ₯'을 톡해 인λ₯˜κ°€ μ΄μ œκΉŒμ§€ 풀지 λͺ»ν–ˆλ˜
temperature(0.8)>
max_length(128)> 64
μΈκ°„μ²˜λŸΌ μƒκ°ν•˜κ³ , ν–‰λ™ν•˜λŠ” '지λŠ₯'을 톡해 인λ₯˜κ°€ μ΄μ œκΉŒμ§€ 풀지 λͺ»ν–ˆλ˜ 문제의 해닡을 찾을 수 μžˆμ„ 것이닀. κ³Όν•™κΈ°μˆ μ΄ κ³ λ„λ‘œ λ°œλ‹¬ν•œ 21μ„ΈκΈ°λ₯Ό μ‚΄μ•„κ°ˆ 우리 μ•„μ΄λ“€μ—κ²Œ κ°€μž₯ ν•„μš”ν•œ 것은 사고λ ₯ ν›ˆλ ¨μ΄λ‹€. 사고λ ₯ ν›ˆλ ¨μ„ 톡해, 세상
prompt>
...
```
### python
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained(
'kakaobrain/kogpt', revision='KoGPT6B-ryan1.5b-float16', # or float32 version: revision=KoGPT6B-ryan1.5b
bos_token='[BOS]', eos_token='[EOS]', unk_token='[UNK]', pad_token='[PAD]', mask_token='[MASK]'
)
model = AutoModelForCausalLM.from_pretrained(
'kakaobrain/kogpt', revision='KoGPT6B-ryan1.5b-float16', # or float32 version: revision=KoGPT6B-ryan1.5b
pad_token_id=tokenizer.eos_token_id,
torch_dtype='auto', low_cpu_mem_usage=True
).to(device='cuda', non_blocking=True)
_ = model.eval()
prompt = 'μΈκ°„μ²˜λŸΌ μƒκ°ν•˜κ³ , ν–‰λ™ν•˜λŠ” \'지λŠ₯\'을 톡해 인λ₯˜κ°€ μ΄μ œκΉŒμ§€ 풀지 λͺ»ν–ˆλ˜'
with torch.no_grad():
tokens = tokenizer.encode(prompt, return_tensors='pt').to(device='cuda', non_blocking=True)
gen_tokens = model.generate(tokens, do_sample=True, temperature=0.8, max_length=64)
generated = tokenizer.batch_decode(gen_tokens)[0]
print(generated) # print: μΈκ°„μ²˜λŸΌ μƒκ°ν•˜κ³ , ν–‰λ™ν•˜λŠ” '지λŠ₯'을 톡해 인λ₯˜κ°€ μ΄μ œκΉŒμ§€ 풀지 λͺ»ν–ˆλ˜ 문제의 해닡을 찾을 수 μžˆμ„ 것이닀. κ³Όν•™κΈ°μˆ μ΄ κ³ λ„λ‘œ λ°œλ‹¬ν•œ 21μ„ΈκΈ°λ₯Ό μ‚΄μ•„κ°ˆ 우리 μ•„μ΄λ“€μ—κ²Œ κ°€μž₯ ν•„μš”ν•œ 것은 사고λ ₯ ν›ˆλ ¨μ΄λ‹€. 사고λ ₯ ν›ˆλ ¨μ„ 톡해, 세상
```
## Experiments
### In-context Few-Shots
| Models | #params | NSMC (Acc.) | YNAT (F1) | KLUE-STS (F1) |
|:--------------|--------:|------------:|----------:|--------------:|
| HyperCLOVA[1] | 1.3B | 83.9 | 58.7 | 60.9 |
| HyperCLOVA[1] | 6.9B | 83.8 | 67.5 | 59.3 |
| HyperCLOVA[1] | 13.0B | 87.9 | 67.9 | 60.0 |
| HyperCLOVA[1] | 39.0B | 88.0 | 71.4 | 61.6 |
| HyperCLOVA[1] | 82.0B | **88.2** | 72.7 | **65.1** |
| **Ours** | 6.0B | 87.8 | **78.0** | 64.3 |
### Finetuning / P-Tuning
We have been reported to have issues(https://github.com/kakaobrain/kogpt/issues/17) with our downstream evaluation.
The previously published performance evaluation table was deleted because it was difficult to see it as a fair comparison because the comparison target algorithm was different and the performance measurement method could not be confirmed.
You can refer to the above issue link for the existing performance evaluation table and troubleshooting results.
## Limitations
KakaoBrain `KoGPT` was trained on `ryan dataset`, a dataset known to contain profanity, lewd, political changed, and other harsh language.
Therefore, `KoGPT` can generate socially unacceptable texts. As with all language models, It is difficult to predict in advance how `KoGPT` will response to particular prompts and offensive content without warning.
Primarily Korean: `KoGPT` is primarily trained on Korean texts, and is best for classifying, searching, summarizing or generating such texts.
`KoGPT` by default perform worse on inputs that are different from the data distribution it is trained on, including non-Korean as well as specific dialects of Korean that are not well represented in the training data.
[comment]: <> (If abnormal or socially unacceptable text is generated during testing, please send a "prompt" and the "generated text" to [kogpt-report@kakaobrain.com]&#40;mailto:kogpt-report@kakaobrain.com&#41;. )
카카였브레인 `KoGPT`λŠ” μš•μ„€, μŒλž€, μ •μΉ˜μ  λ‚΄μš© 및 기타 거친 언어에 λŒ€ν•œ 처리λ₯Ό ν•˜μ§€ μ•Šμ€ `ryan dataset`으둜 ν•™μŠ΅ν•˜μ˜€μŠ΅λ‹ˆλ‹€.
λ”°λΌμ„œ `KoGPT`λŠ” μ‚¬νšŒμ μœΌλ‘œ μš©μΈλ˜μ§€ μ•Šμ€ ν…μŠ€νŠΈλ₯Ό 생성할 수 μžˆμŠ΅λ‹ˆλ‹€. λ‹€λ₯Έ μ–Έμ–΄ λͺ¨λΈκ³Ό λ§ˆμ°¬κ°€μ§€λ‘œ νŠΉμ • ν”„λ‘¬ν”„νŠΈμ™€ 곡격적인 μ½˜ν…μΈ μ— μ–΄λ– ν•œ κ²°κ³Όλ₯Ό 생성할지 사전에 νŒŒμ•…ν•˜κΈ° μ–΄λ ΅μŠ΅λ‹ˆλ‹€.
`KoGPT`λŠ” 주둜 ν•œκ΅­μ–΄ ν…μŠ€νŠΈλ‘œ ν•™μŠ΅μ„ ν•˜μ˜€μœΌλ©° μ΄λŸ¬ν•œ ν…μŠ€νŠΈλ₯Ό λΆ„λ₯˜, 검색, μš”μ•½ λ˜λŠ” μƒμ„±ν•˜λŠ”λ° κ°€μž₯ μ ν•©ν•©λ‹ˆλ‹€.
기본적으둜 `KoGPT`λŠ” ν•™μŠ΅ 데이터에 잘 λ‚˜νƒ€λ‚˜μ§€ μ•ŠλŠ” λ°©μ–ΈλΏλ§Œμ•„λ‹ˆλΌ ν•œκ΅­μ–΄κ°€ μ•„λ‹Œ κ²½μš°μ™€ 같이 ν•™μŠ΅ λ°μ΄ν„°μ—μ„œ λ°œκ²¬ν•˜κΈ° μ–΄λ €μš΄ μž…λ ₯μ—μ„œ 쒋지 μ•Šμ€ μ„±λŠ₯을 λ³΄μž…λ‹ˆλ‹€.
[comment]: <> (ν…ŒμŠ€νŠΈμ€‘μ— λ°œμƒν•œ 비정상적인 ν˜Ήμ€ μ‚¬νšŒμ μœΌλ‘œ μš©μΈλ˜μ§€ μ•ŠλŠ” ν…μŠ€νŠΈκ°€ μƒμ„±λœ 경우 [kogpt-report@kakaobrain.com]&#40;mailto:kogpt-report@kakaobrain.com&#41;둜 "prompt"와 "μƒμ„±λœ λ¬Έμž₯"을 ν•¨κ»˜ λ³΄λ‚΄μ£Όμ‹œκΈ° λ°”λžλ‹ˆλ‹€.)
## Citation
If you apply this library or model to any project and research, please cite our code:
```
@misc{kakaobrain2021kogpt,
title = {KoGPT: KakaoBrain Korean(hangul) Generative Pre-trained Transformer},
author = {Ildoo Kim and Gunsoo Han and Jiyeon Ham and Woonhyuk Baek},
year = {2021},
howpublished = {\url{https://github.com/kakaobrain/kogpt}},
}
```
## Contact
This is released as an open source in the hope that it will be helpful to many research institutes and startups for research purposes. We look forward to contacting us from various places who wish to cooperate with us.
[contact@kakaobrain.com](mailto:contact@kakaobrain.com)
## License
The `source code` of KakaoBrain `KoGPT` are licensed under [Apache 2.0](LICENSE.apache-2.0) License.
The `pretrained wieghts` of KakaoBrain `KoGPT` are licensed under [CC-BY-NC-ND 4.0 License](https://creativecommons.org/licenses/by-nc-nd/4.0/) License.
카카였브레인 `KoGPT`의 `μ†ŒμŠ€μ½”λ“œ(source code)`λŠ” [Apache 2.0](LICENSE.apache-2.0) λΌμ΄μ„ μŠ€ ν•˜μ— κ³΅κ°œλ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€.
카카였브레인 `KoGPT`의 `μ‚¬μ „ν•™μŠ΅λœ κ°€μ€‘μΉ˜(pretrained weights)`λŠ” [CC-BY-NC-ND 4.0 λΌμ΄μ„ μŠ€](https://creativecommons.org/licenses/by-nc-nd/4.0/) λΌμ΄μ„ μŠ€ ν•˜μ— κ³΅κ°œλ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€.
λͺ¨λΈ 및 μ½”λ“œ, μ‚¬μ „ν•™μŠ΅λœ κ°€μ€‘μΉ˜λ₯Ό μ‚¬μš©ν•  경우 λΌμ΄μ„ μŠ€ λ‚΄μš©μ„ μ€€μˆ˜ν•΄ μ£Όμ‹­μ‹œμ˜€. λΌμ΄μ„ μŠ€ 전문은 [Apache 2.0](LICENSE.apache-2.0), [LICENSE.cc-by-nc-nd-4.0](LICENSE.cc-by-nc-nd-4.0) νŒŒμΌμ—μ„œ ν™•μΈν•˜μ‹€ 수 μžˆμŠ΅λ‹ˆλ‹€.
## References
[1] [HyperCLOVA](https://arxiv.org/abs/2109.04650): Kim, Boseop, et al. "What changes can large-scale language models bring? intensive study on hyperclova: Billions-scale korean generative pretrained transformers." arXiv preprint arXiv:2109.04650 (2021).