hyunwoongko
commited on
Commit
•
8f9904d
1
Parent(s):
632aab2
Update README.md
Browse files
README.md
CHANGED
@@ -6,122 +6,134 @@ tags:
|
|
6 |
- causal-lm
|
7 |
license: apache-2.0
|
8 |
datasets:
|
9 |
-
-
|
10 |
|
11 |
---
|
12 |
|
13 |
# GPT-NeoX-Ko-1.3B
|
14 |
|
15 |
## Model Description
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
|
23 |
-
|
24 |
-
| \\(n_{
|
25 |
-
| \\(
|
26 |
-
| \\(
|
27 |
-
| \\(
|
28 |
-
|
|
29 |
-
| \\(d_{head}\\) | 128 |
|
30 |
-
| \\(n_{ctx}\\) | 2048 |
|
31 |
-
| \\(n_{vocab}\\) | 30080/30000† |
|
32 |
-
| Positional Encoding | [Rotary Position Embedding (RoPE)](https://arxiv.org/abs/2104.09864) |
|
33 |
| RoPE Dimensions | [64](https://github.com/kingoflolz/mesh-transformer-jax/blob/f2aa66e0925de6593dcbb70e72399b97b4130482/mesh_transformer/layers.py#L223) |
|
34 |
-
<figcaption><p><strong>*</strong> Each layer consists of one feedforward block and one self attention block.</p>
|
35 |
|
36 |
-
The model consists of 24 layers with a model dimension of 2048, and a feedforward dimension of 8192. The model
|
37 |
dimension is split into 16 heads, each with a dimension of 128. Rotary Position Embedding (RoPE) is applied to 64
|
38 |
-
dimensions of each head. The model is trained with a tokenization vocabulary of
|
39 |
|
40 |
## Training data
|
41 |
|
42 |
-
GPT-NeoX-Ko was trained on 1.2TB Korean Dataset, a large-scale curated dataset created by [
|
43 |
|
44 |
## Training procedure
|
45 |
|
46 |
-
|
47 |
-
|
48 |
-
## Intended Use and Limitations
|
49 |
|
50 |
-
|
51 |
-
|
52 |
-
### How to use
|
53 |
|
54 |
This model can be easily loaded using the `AutoModelForCausalLM` functionality:
|
55 |
|
56 |
```python
|
57 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
58 |
|
59 |
-
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-ko-1.
|
60 |
-
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neox-ko-1.
|
61 |
```
|
62 |
|
63 |
-
|
|
|
|
|
64 |
|
65 |
-
|
|
|
66 |
|
67 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
68 |
|
69 |
As with all language models, it is hard to predict in advance how GPT-NeoX-Ko will respond to particular prompts and offensive content may occur without warning. We recommend having a human curate or filter the outputs before releasing them, both to censor undesirable content and to improve the quality of the results.
|
70 |
|
71 |
## Evaluation results
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
72 |
|
73 |
-
|
74 |
|
75 |
-
|
|
76 |
-
|
77 |
-
|
|
78 |
-
|
|
79 |
-
|
|
80 |
|
|
|
81 |
|
82 |
-
|
|
|
|
|
|
|
|
|
83 |
|
84 |
-
|
85 |
-
running <a href="https://github.com/EleutherAI/lm-evaluation-harness/"><code>lm-evaluation-harness</code></a> either with released
|
86 |
-
weights or with API access. Due to subtle implementation differences as well as different zero shot task framing, these
|
87 |
-
might not be directly comparable. See <a href="https://blog.eleuther.ai/gpt3-model-sizes/">this blog post</a> for more
|
88 |
-
details.</p>
|
89 |
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
|
|
|
94 |
|
95 |
-
<p><strong
|
96 |
-
|
97 |
-
|
98 |
|
99 |
## Citation and Related Information
|
100 |
|
101 |
### BibTeX entry
|
102 |
|
103 |
-
|
|
|
104 |
```bibtex
|
105 |
@misc{gpt-neox-ko,
|
106 |
title = {{GPT-NeoX-Ko: Open-Source Korean Autoregressive Language Model}},
|
107 |
-
author = {Hyunwoong,
|
108 |
url = {https://www.github.com/eleutherai/multilingual},
|
109 |
month = {9},
|
110 |
year = {2022},
|
111 |
}
|
112 |
```
|
113 |
|
114 |
-
|
115 |
-
|
116 |
-
## Acknowledgements
|
117 |
-
|
118 |
-
This project would not have been possible without compute generously provided by Google through the
|
119 |
-
[TPU Research Cloud](https://sites.research.google/trc/), as well as the Cloud TPU team for providing early access to the [Cloud TPU VM](https://cloud.google.com/blog/products/compute/introducing-cloud-tpu-vms) Alpha.
|
120 |
|
121 |
-
|
122 |
-
|
123 |
-
- [Stella Biderman](https://www.stellabiderman.com), [Eric Hallahan](https://twitter.com/erichallahan), [Kurumuz](https://github.com/kurumuz/), and [Finetune](https://github.com/finetuneanon/) for converting the model to be compatible with the `transformers` package.
|
124 |
-
- [Leo Gao](https://twitter.com/nabla_theta) for running zero shot evaluations for the baseline models for the table.
|
125 |
-
- [Laurence Golding](https://github.com/researcher2/) for adding some features to the web demo.
|
126 |
-
- [Aran Komatsuzaki](https://twitter.com/arankomatsuzaki) for advice with experiment design and writing the blog posts.
|
127 |
-
- [Janko Prester](https://github.com/jprester/) for creating the web demo frontend.
|
|
|
6 |
- causal-lm
|
7 |
license: apache-2.0
|
8 |
datasets:
|
9 |
+
- Large-scale Korean dataset created by tunib.
|
10 |
|
11 |
---
|
12 |
|
13 |
# GPT-NeoX-Ko-1.3B
|
14 |
|
15 |
## Model Description
|
16 |
+
GPT-NeoX-Ko is a Korean autoregressive language model made by EleutherAI multilingual team. We collected about 1.2TB Korean dataset for this work, which was done with [TUNiB](https://tunib.ai/). In addition, we used the GPT-NeoX framework for model training and added some Korean tasks to LM-Evaluation-Harness for model evaluation.
|
17 |
+
|
18 |
+
| Hyperparameter | Value |
|
19 |
+
|----------------------|----------------------------------------------------------------------------------------------------------------------------------------|
|
20 |
+
| \\(n_{parameters}\\) | 13,3181,0304 |
|
21 |
+
| \\(n_{layers}\\) | 24 |
|
22 |
+
| \\(d_{model}\\) | 2048 |
|
23 |
+
| \\(d_{ff}\\) | 8192 |
|
24 |
+
| \\(n_{heads}\\) | 16 |
|
25 |
+
| \\(d_{head}\\) | 128 |
|
26 |
+
| \\(n_{ctx}\\) | 2048 |
|
27 |
+
| \\(n_{vocab}\\) | 30,000 / 30,080 |
|
28 |
+
| Positional Encoding | [Rotary Position Embedding (RoPE)](https://arxiv.org/abs/2104.09864) |
|
|
|
|
|
|
|
|
|
29 |
| RoPE Dimensions | [64](https://github.com/kingoflolz/mesh-transformer-jax/blob/f2aa66e0925de6593dcbb70e72399b97b4130482/mesh_transformer/layers.py#L223) |
|
|
|
30 |
|
31 |
+
The model consists of 24 transformer layers with a model dimension of 2048, and a feedforward dimension of 8192. The model
|
32 |
dimension is split into 16 heads, each with a dimension of 128. Rotary Position Embedding (RoPE) is applied to 64
|
33 |
+
dimensions of each head. The model is trained with a tokenization vocabulary of 30000.
|
34 |
|
35 |
## Training data
|
36 |
|
37 |
+
GPT-NeoX-Ko was trained on 1.2TB Korean Dataset, a large-scale curated dataset created by [TUNiB](https://tunib.ai/).
|
38 |
|
39 |
## Training procedure
|
40 |
|
41 |
+
GPT-NeoX-Ko was trained for 213 billion tokens over 102,000 steps on 256 * A100 GPUs. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
|
|
|
|
|
42 |
|
43 |
+
## How to use
|
|
|
|
|
44 |
|
45 |
This model can be easily loaded using the `AutoModelForCausalLM` functionality:
|
46 |
|
47 |
```python
|
48 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
49 |
|
50 |
+
tokenizer = AutoTokenizer.from_pretrained("[EleutherAI/gpt-neox-ko-1.3b](https://huggingface.co/EleutherAI/gpt-neox-ko-1.3b)")
|
51 |
+
model = AutoModelForCausalLM.from_pretrained("[EleutherAI/gpt-neox-ko-1.3b](https://huggingface.co/EleutherAI/gpt-neox-ko-1.3b)")
|
52 |
```
|
53 |
|
54 |
+
## Privacy considerations and Limitations
|
55 |
+
|
56 |
+
GPT-NeoX-Ko learns an inner representation of the Korean that can be used to extract features useful for downstream tasks. The model is best at what it was pretrained for however, which is generating text from a prompt.
|
57 |
|
58 |
+
### Privacy considerations
|
59 |
+
General training algorithms for pretrained language model have many hazards that memorize personal information in training data. We added the following tokens to vocabulary to mitigate privacy problem and replaced much personal information to these tokens in data preprocessing steps.
|
60 |
|
61 |
+
* `<|acc|>` : bank account number
|
62 |
+
* `<|rrn|>` : resident registration number
|
63 |
+
* `<|tell|>` : phone number
|
64 |
+
|
65 |
+
### Limitations and Biases
|
66 |
+
|
67 |
+
The core functionality of GPT-NeoX-Ko is taking a string of text and predicting the next token. While language models are widely used for tasks other than this, there are a lot of unknowns with this work. When prompting GPT-NeoX-Ko it is important to remember that the statistically most likely next token is often not the token that produces the most "accurate" text. Never depend upon GPT-NeoX-Ko to produce factually accurate output.Depending upon use case GPT-NeoX-Ko may produce socially unacceptable text.
|
68 |
|
69 |
As with all language models, it is hard to predict in advance how GPT-NeoX-Ko will respond to particular prompts and offensive content may occur without warning. We recommend having a human curate or filter the outputs before releasing them, both to censor undesirable content and to improve the quality of the results.
|
70 |
|
71 |
## Evaluation results
|
72 |
+
We used the [KOBEST dataset](https://arxiv.org/abs/2204.04541), which consists of five Korean downstream tasks for model evaluation.
|
73 |
+
We added the corresponding tasks to [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and utilized prompt templates described in the paper.
|
74 |
+
The following tables show the evaluation results with the various number of few-shot examples. You can reproduce these results using [multilingual-ko branch of lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/multilingual-ko).
|
75 |
+
|
76 |
+
- the number of few shot examples = 1
|
77 |
+
|
78 |
+
| Model | parameters | boolq | copa | wic | hellaswag | sentineg | average |
|
79 |
+
|----------------------------------------------------------------------------------------------|------------|-------|--------|--------|-----------|----------|---------|
|
80 |
+
| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5) † | 1.2B | | | | | | |
|
81 |
+
| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt) * | 6.0B | | | | | | |
|
82 |
+
| [EleutherAI/gpt-neox-ko-1.3b](https://huggingface.co/EleutherAI/gpt-neox-ko-1.3b) (ours) | 1.3B | 0.659 | 0.6993 | 0.6292 | 0.3884 | 0.8427 | 0.64372 |
|
83 |
+
|
84 |
+
- the number of few shot examples = 5
|
85 |
+
|
86 |
+
| Model | parameters | boolq | copa | wic | hellaswag | sentineg | average |
|
87 |
+
|----------------------------------------------------------------------------------------------|------------|--------|--------|-------|-----------|----------|---------|
|
88 |
+
| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5) † | 1.2B | | | | | | |
|
89 |
+
| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt) * | 6.0B | | | | | | |
|
90 |
+
| [EleutherAI/gpt-neox-ko-1.3b](https://huggingface.co/EleutherAI/gpt-neox-ko-1.3b) (ours) | 1.3B | 0.6309 | 0.7053 | 0.656 | 0.3984 | 0.7979 | 0.6337 |
|
91 |
|
92 |
+
- the number of few shot examples = 10
|
93 |
|
94 |
+
| Model | parameters | boolq | copa | wic | hellaswag | sentineg | average |
|
95 |
+
|----------------------------------------------------------------------------------------------|------------|------------|------------|------------|------------|------------|------------|
|
96 |
+
| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5) † | 1.2B | **0.6663** | 0.6222 | 0.656 | 0.4011 | 0.3534 | 0.5398 |
|
97 |
+
| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt) * | 6.0B | 0.3241 | 0.719 | 0.1356 | **0.4616** | 0.8056 | 0.48936 |
|
98 |
+
| [EleutherAI/gpt-neox-ko-1.3b](https://huggingface.co/EleutherAI/gpt-neox-ko-1.3b) (ours) | 1.3B | 0.5174 | 0.**7072** | **0.6567** | 0.417 | **0.8444** | **0.5468** |
|
99 |
|
100 |
+
- the number of few shot examples = 50
|
101 |
|
102 |
+
| Model | parameters | boolq | copa | wic | hellaswag | sentineg | average |
|
103 |
+
|----------------------------------------------------------------------------------------------|------------|-------|--------|--------|-----------|----------|---------|
|
104 |
+
| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5) † | 1.2B | | | | | | |
|
105 |
+
| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt) * | 6.0B | | | | | | |
|
106 |
+
| [EleutherAI/gpt-neox-ko-1.3b](https://huggingface.co/EleutherAI/gpt-neox-ko-1.3b) (ours) | 1.3B | 0.49 | 0.7097 | 0.5834 | 0.4416 | 0.7382 | 0.59258 |
|
107 |
|
108 |
+
- the number of few shot examples = 100
|
|
|
|
|
|
|
|
|
109 |
|
110 |
+
| Model | parameters | boolq | copa | wic | hellaswag | sentineg | average |
|
111 |
+
|----------------------------------------------------------------------------------------------|------------|--------|--------|--------|-----------|----------|---------|
|
112 |
+
| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5) † | | | | | | | |
|
113 |
+
| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt) * | | | | | | | |
|
114 |
+
| [EleutherAI/gpt-neox-ko-1.3b](https://huggingface.co/EleutherAI/gpt-neox-ko-1.3b) (ours) | | 0.4867 | 0.7207 | 0.5877 | 0.5877 | 0.7407 | 0.59234 |
|
115 |
|
116 |
+
<p><strong>†</strong> The model card of this model provides evaluation results for the KOBEST dataset, but when we evaluated the model with the prompts described in the paper, we can't get similar results to it. Therefore, we checked the KOBEST paper and found that the results were similar to the fine-tuning results reported in the paper. Because we evaluated prompt-based generation without fine-tuning the model, the results provided by the model card for the this model may differ.</p>
|
117 |
+
|
118 |
+
<p><strong>*</strong> Since this model does not provide evaluation results with KOBEST dataset, we evaluated the model using lm-evaluation-harness ourselves. you can reproduce this result using the source code included in the multilingual-ko branch of lm-evaluation-harness.</p>
|
119 |
|
120 |
## Citation and Related Information
|
121 |
|
122 |
### BibTeX entry
|
123 |
|
124 |
+
If you find our work useful, please consider citing:
|
125 |
+
|
126 |
```bibtex
|
127 |
@misc{gpt-neox-ko,
|
128 |
title = {{GPT-NeoX-Ko: Open-Source Korean Autoregressive Language Model}},
|
129 |
+
author = {Ko, Hyunwoong and Yang, Kichang and Ryu, Minho and Kim, Taekyun and Yang, Seungmu and Hyun, Jiwoong and Park, Sungho and Ryu, Myunghyun and Keum, Bitna and Oh, Saechan and Kim, Soohwan and Park, Kyubyong},
|
130 |
url = {https://www.github.com/eleutherai/multilingual},
|
131 |
month = {9},
|
132 |
year = {2022},
|
133 |
}
|
134 |
```
|
135 |
|
136 |
+
### Acknowledgements
|
|
|
|
|
|
|
|
|
|
|
137 |
|
138 |
+
This project would not have been possible without compute generously provided by [Stability.ai](https://stability.ai), thanks them for providing a large amount of GPU resources for this work.
|
139 |
+
And thanks also go to [TUNiB](https://tunib.ai) for providing a large-scale Korean dataset for this work.
|
|
|
|
|
|
|
|
|
|