EleutherAI
/

polyglot-ko-3.8b

@@ -77,6 +77,8 @@ We evaluate Polyglot-Ko-3.8B on [KOBEST dataset](https://arxiv.org/abs/2204.0454
 The following tables show the results when the number of few-shot examples differ. You can reproduce these results using the [polyglot branch of lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/polyglot) and the following scripts. For a fair comparison, all models were run under the same conditions and using the same prompts. In the tables, `n` refers to the number of few-shot examples.
 ```console
 python main.py \
    --model gpt2 \
@@ -90,31 +92,76 @@ python main.py \
 ### COPA (F1)
-| Model                                                                                        | params | n=0 | n=5 | n=10 | n=50 |
 |----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
 | [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)          | 1.2B   | 0.6696 | 0.6477 | 0.6419  | 0.6514  |
 | [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                  | 6.0B   | 0.7345 | 0.7287 | 0.7277  | 0.7479  |
 | [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B)                              | 7.5B   | 0.6723 | 0.6731 | 0.6769  | 0.7119  |
 | [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)            | 1.3B   | 0.7196 | 0.7193 | 0.7204  | 0.7206  |
-| **[EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b) (this)** | **3.8B**   | **0.7595** | **0.7608** | **0.7638**  | **0.7788**  |
 | [EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b)            | 5.8B   | 0.7745 | 0.7676 | 0.7775  | 0.7887  |
 | [EleutherAI/polyglot-ko-12.8b](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)          | 12.8B  | 0.7937 | 0.8108 | 0.8037  | 0.8369  |
-<img src="https://user-images.githubusercontent.com/19511788/233820235-6f617932-3b18-4534-be14-8df9e80b8a06.jpg" width="1000px">
 ### HellaSwag (F1)
-| Model                                                                                          | params |n=0 | n=5 | n=10 | n=50 |
-|------------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
-| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)            | 1.2B   | 0.5243 | 0.5272 | 0.5166  | 0.5352  |
-| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                    | 6.0B   | 0.5590 | 0.5833 | 0.5828  | 0.5907  |
-| [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B)                                | 7.5B   | 0.5665 | 0.5689 | 0.5565  | 0.5622  |
-| [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)              | 1.3B   | 0.5247 | 0.5260 | 0.5278  | 0.5427  |
-| **[EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b) (this)**   | **3.8B** | **0.5707** | **0.5830** | **0.5670**  | **0.5787**  |
-| [EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b)              | 5.8B   | 0.5976 | 0.5998 | 0.5979  | 0.6208  |
-| [EleutherAI/polyglot-ko-12.8b](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)            | 12.8B  | 0.5954 | 0.6306 | 0.6098  | 0.6118  |
-<img src="https://user-images.githubusercontent.com/19511788/233820233-0127983e-4b37-48ce-89e5-51509ed9b1f2.jpg" width="1000px">
 ## Limitations and Biases

 The following tables show the results when the number of few-shot examples differ. You can reproduce these results using the [polyglot branch of lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/polyglot) and the following scripts. For a fair comparison, all models were run under the same conditions and using the same prompts. In the tables, `n` refers to the number of few-shot examples.
+In case of WiC dataset, all models show random performance.
 ```console
 python main.py \
    --model gpt2 \
 ### COPA (F1)
+| Model                                                                                        | params | n=0    | n=5    | n=10    | n=50    |
 |----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
 | [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)          | 1.2B   | 0.6696 | 0.6477 | 0.6419  | 0.6514  |
 | [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                  | 6.0B   | 0.7345 | 0.7287 | 0.7277  | 0.7479  |
 | [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B)                              | 7.5B   | 0.6723 | 0.6731 | 0.6769  | 0.7119  |
 | [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)            | 1.3B   | 0.7196 | 0.7193 | 0.7204  | 0.7206  |
+| **[EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b) (this)** | **3.8B** | **0.7595** | **0.7608** | **0.7638** | **0.7788** |
 | [EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b)            | 5.8B   | 0.7745 | 0.7676 | 0.7775  | 0.7887  |
 | [EleutherAI/polyglot-ko-12.8b](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)          | 12.8B  | 0.7937 | 0.8108 | 0.8037  | 0.8369  |
+<img src="https://github.com/EleutherAI/polyglot/assets/19511788/d5b49364-aed5-4467-bae2-5a322c8e2ceb" width="800px">
 ### HellaSwag (F1)
+| Model                                                                                        | params | n=0    | n=5    | n=10    | n=50    |
+|----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
+| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)          | 1.2B   | 0.5243 | 0.5272 | 0.5166  | 0.5352  |
+| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                  | 6.0B   | 0.5590 | 0.5833 | 0.5828  | 0.5907  |
+| [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B)                              | 7.5B   | 0.5665 | 0.5689 | 0.5565  | 0.5622  |
+| [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)            | 1.3B   | 0.5247 | 0.5260 | 0.5278  | 0.5427  |
+| **[EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b) (this)** | **3.8B** | **0.5707** | **0.5830** | **0.5670** | **0.5787** |
+| [EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b)            | 5.8B   | 0.5976 | 0.5998 | 0.5979  | 0.6208  |
+| [EleutherAI/polyglot-ko-12.8b](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)          | 12.8B  | 0.5954 | 0.6306 | 0.6098  | 0.6118  |
+<img src="https://github.com/EleutherAI/polyglot/assets/19511788/5acb60ac-161a-4ab3-a296-db4442e08b7f" width="800px">
+### BoolQ (F1)
+| Model                                                                                        | params | n=0    | n=5    | n=10    | n=50    |
+|----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
+| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)          | 1.2B   | 0.3356 | 0.4014 | 0.3640  | 0.3560  |
+| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                  | 6.0B   | 0.4514 | 0.5981 | 0.5499  | 0.5202  |
+| [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B)                              | 7.5B   | 0.4464 | 0.3324 | 0.3324  | 0.3324  |
+| [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)            | 1.3B   | 0.3552 | 0.4751 | 0.4109  | 0.4038  |
+| **[EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b) (this)** | **3.8B** | **0.4320** | **0.5263** | **0.4930** | **0.4038** |
+| [EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b)            | 5.8B   | 0.4356 | 0.5698 | 0.5187  | 0.5236  |
+| [EleutherAI/polyglot-ko-12.8b](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)          | 12.8B  | 0.4818 | 0.6041 | 0.6289  | 0.6448  |
+<img src="https://github.com/EleutherAI/polyglot/assets/19511788/b74c23c0-01f3-4b68-9e10-a48e9aa052ab" width="800px">
+### SentiNeg (F1)
+| Model                                                                                        | params | n=0    | n=5    | n=10    | n=50    |
+|----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
+| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)          | 1.2B   | 0.6065 | 0.6878 | 0.7280  | 0.8413  |
+| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                  | 6.0B   | 0.3747 | 0.8942 | 0.9294  | 0.9698  |
+| [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B)                              | 7.5B   | 0.3578 | 0.4471 | 0.3964  | 0.5271  |
+| [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)            | 1.3B   | 0.6790 | 0.6257 | 0.5514  | 0.7851  |
+| **[EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b) (this)** | **3.8B** | **0.4858** | **0.7950** | **0.7320** | **0.7851** |
+| [EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b)            | 5.8B   | 0.3394 | 0.8841 | 0.8808  | 0.9521  |
+| [EleutherAI/polyglot-ko-12.8b](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)          | 12.8B  | 0.9117 | 0.9015 | 0.9345  | 0.9723  |
+<img src="https://github.com/EleutherAI/polyglot/assets/19511788/95b56b19-d349-4b70-9ff9-94a5560f89ee" width="800px">
+### WiC (F1)
+| Model                                                                                        | params | n=0    | n=5    | n=10    | n=50    |
+|----------------------------------------------------------------------------------------------|--------|--------|--------|---------|---------|
+| [skt/ko-gpt-trinity-1.2B-v0.5](https://huggingface.co/skt/ko-gpt-trinity-1.2B-v0.5)          | 1.2B   | 0.3290 | 0.4313 | 0.4001  | 0.3621  |
+| [kakaobrain/kogpt](https://huggingface.co/kakaobrain/kogpt)                                  | 6.0B   | 0.3526 | 0.4775 | 0.4358  | 0.4061  |
+| [facebook/xglm-7.5B](https://huggingface.co/facebook/xglm-7.5B)                              | 7.5B   | 0.3280 | 0.4903 | 0.4945  | 0.3656  |
+| [EleutherAI/polyglot-ko-1.3b](https://huggingface.co/EleutherAI/polyglot-ko-1.3b)            | 1.3B   | 0.3297 | 0.4850 | 0.4650  | 0.3290  |
+| **[EleutherAI/polyglot-ko-3.8b](https://huggingface.co/EleutherAI/polyglot-ko-3.8b) (this)** | **3.8B** | **0.3390** | **0.4944** | **0.4203** | **0.3835** |
+| [EleutherAI/polyglot-ko-5.8b](https://huggingface.co/EleutherAI/polyglot-ko-5.8b)            | 5.8B   | 0.3913 | 0.4688 | 0.4189  | 0.3910  |
+| [EleutherAI/polyglot-ko-12.8b](https://huggingface.co/EleutherAI/polyglot-ko-12.8b)          | 12.8B  | 0.3985 | 0.3683 | 0.3307  | 0.3273  |
+<img src="https://github.com/EleutherAI/polyglot/assets/19511788/4de4a4c3-d7ac-4e04-8b0c-0d533fe88294" width="800px">
 ## Limitations and Biases