PracticeLLM
/

Twice-KoSOLAR-16.1B-test

@@ -53,26 +53,16 @@ dtype: float16
 # **Model Benchmark**
-## Open Ko leaderboard
-- Follow up as [Ko-link](https://huggingface.co/spaces/upstage/open-ko-llm-leaderboard).
 | Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Ko-CommonGenV2 |
 | --- | --- | --- | --- | --- | --- | --- |
 | PracticeLLM/Twice-KoSOLAR-16.1B-test | NaN | NaN | NaN | NaN | NaN | NaN |
 | [jjourney1125/M-SOLAR-10.7B-v1.0](https://huggingface.co/jjourney1125/M-SOLAR-10.7B-v1.0) | 55.15 | 49.57 | 60.12 | 54.60 | 49.23 | 62.22 |
 | [seungduk/KoSOLAR-10.7B-v0.1](https://huggingface.co/seungduk/KoSOLAR-10.7B-v0.1) | 52.40 | 47.18 | 59.54 | 52.04 | 41.84 | 61.39 |
-- Follow up as [En-link](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
-| Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
-| --- | --- | --- | --- | --- | --- | --- | --- |
-| PracticeLLM/Twice-KoSOLAR-16.1B-test | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
-| [kyujinpy/Sakura-SOLAR-Instruct](https://huggingface.co/kyujinpy/Sakura-SOLAR-Instruct) | **74.40** | 70.99 | 88.42 | 66.33 | 71.79 | 83.66 | 65.20 |
-| [seungduk/KoSOLAR-10.7B-v0.1](https://huggingface.co/seungduk/KoSOLAR-10.7B-v0.1) | 66.04 | 62.03 | 84.54 | 65.56 | 45.03 | 83.58 | 55.50 |
-| [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0) | 66.04 | 61.95 | 84.60 | 65.48 | 45.04 | 83.66 | 55.50 |
-| [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 60.97 | 59.98 | 83.31 | 64.16 | 42.15 | 78.37 | 37.83 |
-## lm-evaluation-harness(zero-shot)
-- Follow up as [beomi/LM-Harness](https://github.com/Beomi/ko-lm-evaluation-harness)
 ```
 gpt2 (pretrained=PracticeLLM/Twice-KoSOLAR-16.1B-test), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
 |      Task      |Version| Metric |Value |   |Stderr|
@@ -87,6 +77,19 @@ gpt2 (pretrained=PracticeLLM/Twice-KoSOLAR-16.1B-test), limit: None, provide_des
 |kobest_sentineg |      0|acc     |0.7078|±  |0.0229|
 |                |       |macro_f1|0.7071|±  |0.0229|
 gpt2 (pretrained=jjourney1125/M-SOLAR-10.7B-v1.0), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
 |      Task      |Version| Metric |Value |   |Stderr|
 |----------------|------:|--------|-----:|---|-----:|
@@ -112,12 +115,20 @@ gpt2 (pretrained=yanolja/KoSOLAR-10.7B-v0.1), limit: None, provide_description:
 |                |       |macro_f1|0.4296|±  |0.0221|
 |kobest_sentineg |      0|acc     |0.7506|±  |0.0217|
 |                |       |macro_f1|0.7505|±  |0.0217|
-```
 - Follow up as [Eleuther/LM-Harness](https://github.com/EleutherAI/lm-evaluation-harness)
-```
 (will update)
 ```

 # **Model Benchmark**
+## Open Ko-LLM leaderboard & lm-evaluation-harness(zero-shot)
+- Follow up as [Ko-link](https://huggingface.co/spaces/upstage/open-ko-llm-leaderboard).
 | Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Ko-CommonGenV2 |
 | --- | --- | --- | --- | --- | --- | --- |
 | PracticeLLM/Twice-KoSOLAR-16.1B-test | NaN | NaN | NaN | NaN | NaN | NaN |
+| [Megastudy/M-SOLAR-10.7B-v1.1-beta](https://huggingface.co/Megastudy/M-SOLAR-10.7B-v1.1-beta) | 55.25 | 51.71 | 60.86 | 54.24 | 47.12 | 62.34 |
 | [jjourney1125/M-SOLAR-10.7B-v1.0](https://huggingface.co/jjourney1125/M-SOLAR-10.7B-v1.0) | 55.15 | 49.57 | 60.12 | 54.60 | 49.23 | 62.22 |
 | [seungduk/KoSOLAR-10.7B-v0.1](https://huggingface.co/seungduk/KoSOLAR-10.7B-v0.1) | 52.40 | 47.18 | 59.54 | 52.04 | 41.84 | 61.39 |
+- Follow up as [beomi/LM-Harness](https://github.com/Beomi/ko-lm-evaluation-harness)
 ```
 gpt2 (pretrained=PracticeLLM/Twice-KoSOLAR-16.1B-test), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
 |      Task      |Version| Metric |Value |   |Stderr|
 |kobest_sentineg |      0|acc     |0.7078|±  |0.0229|
 |                |       |macro_f1|0.7071|±  |0.0229|
+gpt2 (pretrained=Megastudy/M-SOLAR-10.7B-v1.1-beta), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
+|      Task      |Version| Metric |Value |   |Stderr|
+|----------------|------:|--------|-----:|---|-----:|
+|kobest_boolq    |      0|acc     |0.7137|±  |0.0121|
+|                |       |macro_f1|0.6878|±  |0.0128|
+|kobest_copa     |      0|acc     |0.7060|±  |0.0144|
+|                |       |macro_f1|0.7054|±  |0.0145|
+|kobest_hellaswag|      0|acc     |0.4620|±  |0.0223|
+|                |       |acc_norm|0.5360|±  |0.0223|
+|                |       |macro_f1|0.4595|±  |0.0223|
+|kobest_sentineg |      0|acc     |0.7431|±  |0.0220|
+|                |       |macro_f1|0.7295|±  |0.0230|
 gpt2 (pretrained=jjourney1125/M-SOLAR-10.7B-v1.0), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
 |      Task      |Version| Metric |Value |   |Stderr|
 |----------------|------:|--------|-----:|---|-----:|
 |                |       |macro_f1|0.4296|±  |0.0221|
 |kobest_sentineg |      0|acc     |0.7506|±  |0.0217|
 |                |       |macro_f1|0.7505|±  |0.0217|
+```
+## Open EN-LLM leaderboard & lm-evaluation-harness(zero-shot)
+- Follow up as [En-link](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
+| Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
+| --- | --- | --- | --- | --- | --- | --- | --- |
+| PracticeLLM/Twice-KoSOLAR-16.1B-test | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
+| [kyujinpy/Sakura-SOLAR-Instruct](https://huggingface.co/kyujinpy/Sakura-SOLAR-Instruct) | **74.40** | 70.99 | 88.42 | 66.33 | 71.79 | 83.66 | 65.20 |
+| [seungduk/KoSOLAR-10.7B-v0.1](https://huggingface.co/seungduk/KoSOLAR-10.7B-v0.1) | 66.04 | 62.03 | 84.54 | 65.56 | 45.03 | 83.58 | 55.50 |
+| [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0) | 66.04 | 61.95 | 84.60 | 65.48 | 45.04 | 83.66 | 55.50 |
+| [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 60.97 | 59.98 | 83.31 | 64.16 | 42.15 | 78.37 | 37.83 |
 - Follow up as [Eleuther/LM-Harness](https://github.com/EleutherAI/lm-evaluation-harness)
+```yaml
 (will update)
 ```