kyujinpy commited on
Commit
bb2523b
1 Parent(s): 357fade

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -17
README.md CHANGED
@@ -53,26 +53,16 @@ dtype: float16
53
 
54
  # **Model Benchmark**
55
 
56
- ## Open Ko leaderboard
57
- - Follow up as [Ko-link](https://huggingface.co/spaces/upstage/open-ko-llm-leaderboard).
58
-
59
  | Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Ko-CommonGenV2 |
60
  | --- | --- | --- | --- | --- | --- | --- |
61
  | PracticeLLM/Twice-KoSOLAR-16.1B-test | NaN | NaN | NaN | NaN | NaN | NaN |
 
62
  | [jjourney1125/M-SOLAR-10.7B-v1.0](https://huggingface.co/jjourney1125/M-SOLAR-10.7B-v1.0) | 55.15 | 49.57 | 60.12 | 54.60 | 49.23 | 62.22 |
63
  | [seungduk/KoSOLAR-10.7B-v0.1](https://huggingface.co/seungduk/KoSOLAR-10.7B-v0.1) | 52.40 | 47.18 | 59.54 | 52.04 | 41.84 | 61.39 |
64
 
65
- - Follow up as [En-link](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
66
- | Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
67
- | --- | --- | --- | --- | --- | --- | --- | --- |
68
- | PracticeLLM/Twice-KoSOLAR-16.1B-test | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
69
- | [kyujinpy/Sakura-SOLAR-Instruct](https://huggingface.co/kyujinpy/Sakura-SOLAR-Instruct) | **74.40** | 70.99 | 88.42 | 66.33 | 71.79 | 83.66 | 65.20 |
70
- | [seungduk/KoSOLAR-10.7B-v0.1](https://huggingface.co/seungduk/KoSOLAR-10.7B-v0.1) | 66.04 | 62.03 | 84.54 | 65.56 | 45.03 | 83.58 | 55.50 |
71
- | [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0) | 66.04 | 61.95 | 84.60 | 65.48 | 45.04 | 83.66 | 55.50 |
72
- | [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 60.97 | 59.98 | 83.31 | 64.16 | 42.15 | 78.37 | 37.83 |
73
-
74
- ## lm-evaluation-harness(zero-shot)
75
- - Follow up as [beomi/LM-Harness](https://github.com/Beomi/ko-lm-evaluation-harness)
76
  ```
77
  gpt2 (pretrained=PracticeLLM/Twice-KoSOLAR-16.1B-test), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
78
  | Task |Version| Metric |Value | |Stderr|
@@ -87,6 +77,19 @@ gpt2 (pretrained=PracticeLLM/Twice-KoSOLAR-16.1B-test), limit: None, provide_des
87
  |kobest_sentineg | 0|acc |0.7078|± |0.0229|
88
  | | |macro_f1|0.7071|± |0.0229|
89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
  gpt2 (pretrained=jjourney1125/M-SOLAR-10.7B-v1.0), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
91
  | Task |Version| Metric |Value | |Stderr|
92
  |----------------|------:|--------|-----:|---|-----:|
@@ -112,12 +115,20 @@ gpt2 (pretrained=yanolja/KoSOLAR-10.7B-v0.1), limit: None, provide_description:
112
  | | |macro_f1|0.4296|± |0.0221|
113
  |kobest_sentineg | 0|acc |0.7506|± |0.0217|
114
  | | |macro_f1|0.7505|± |0.0217|
 
115
 
 
 
 
 
 
 
 
 
 
116
 
117
- ```
118
-
119
  - Follow up as [Eleuther/LM-Harness](https://github.com/EleutherAI/lm-evaluation-harness)
120
- ```
121
  (will update)
122
  ```
123
 
 
53
 
54
  # **Model Benchmark**
55
 
56
+ ## Open Ko-LLM leaderboard & lm-evaluation-harness(zero-shot)
57
+ - Follow up as [Ko-link](https://huggingface.co/spaces/upstage/open-ko-llm-leaderboard).
 
58
  | Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Ko-CommonGenV2 |
59
  | --- | --- | --- | --- | --- | --- | --- |
60
  | PracticeLLM/Twice-KoSOLAR-16.1B-test | NaN | NaN | NaN | NaN | NaN | NaN |
61
+ | [Megastudy/M-SOLAR-10.7B-v1.1-beta](https://huggingface.co/Megastudy/M-SOLAR-10.7B-v1.1-beta) | 55.25 | 51.71 | 60.86 | 54.24 | 47.12 | 62.34 |
62
  | [jjourney1125/M-SOLAR-10.7B-v1.0](https://huggingface.co/jjourney1125/M-SOLAR-10.7B-v1.0) | 55.15 | 49.57 | 60.12 | 54.60 | 49.23 | 62.22 |
63
  | [seungduk/KoSOLAR-10.7B-v0.1](https://huggingface.co/seungduk/KoSOLAR-10.7B-v0.1) | 52.40 | 47.18 | 59.54 | 52.04 | 41.84 | 61.39 |
64
 
65
+ - Follow up as [beomi/LM-Harness](https://github.com/Beomi/ko-lm-evaluation-harness)
 
 
 
 
 
 
 
 
 
 
66
  ```
67
  gpt2 (pretrained=PracticeLLM/Twice-KoSOLAR-16.1B-test), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
68
  | Task |Version| Metric |Value | |Stderr|
 
77
  |kobest_sentineg | 0|acc |0.7078|± |0.0229|
78
  | | |macro_f1|0.7071|± |0.0229|
79
 
80
+ gpt2 (pretrained=Megastudy/M-SOLAR-10.7B-v1.1-beta), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
81
+ | Task |Version| Metric |Value | |Stderr|
82
+ |----------------|------:|--------|-----:|---|-----:|
83
+ |kobest_boolq | 0|acc |0.7137|± |0.0121|
84
+ | | |macro_f1|0.6878|± |0.0128|
85
+ |kobest_copa | 0|acc |0.7060|± |0.0144|
86
+ | | |macro_f1|0.7054|± |0.0145|
87
+ |kobest_hellaswag| 0|acc |0.4620|± |0.0223|
88
+ | | |acc_norm|0.5360|± |0.0223|
89
+ | | |macro_f1|0.4595|± |0.0223|
90
+ |kobest_sentineg | 0|acc |0.7431|± |0.0220|
91
+ | | |macro_f1|0.7295|± |0.0230|
92
+
93
  gpt2 (pretrained=jjourney1125/M-SOLAR-10.7B-v1.0), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
94
  | Task |Version| Metric |Value | |Stderr|
95
  |----------------|------:|--------|-----:|---|-----:|
 
115
  | | |macro_f1|0.4296|± |0.0221|
116
  |kobest_sentineg | 0|acc |0.7506|± |0.0217|
117
  | | |macro_f1|0.7505|± |0.0217|
118
+ ```
119
 
120
+ ## Open EN-LLM leaderboard & lm-evaluation-harness(zero-shot)
121
+ - Follow up as [En-link](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard).
122
+ | Model | Average | ARC | HellaSwag | MMLU | TruthfulQA | Winogrande | GSM8K |
123
+ | --- | --- | --- | --- | --- | --- | --- | --- |
124
+ | PracticeLLM/Twice-KoSOLAR-16.1B-test | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
125
+ | [kyujinpy/Sakura-SOLAR-Instruct](https://huggingface.co/kyujinpy/Sakura-SOLAR-Instruct) | **74.40** | 70.99 | 88.42 | 66.33 | 71.79 | 83.66 | 65.20 |
126
+ | [seungduk/KoSOLAR-10.7B-v0.1](https://huggingface.co/seungduk/KoSOLAR-10.7B-v0.1) | 66.04 | 62.03 | 84.54 | 65.56 | 45.03 | 83.58 | 55.50 |
127
+ | [upstage/SOLAR-10.7B-v1.0](https://huggingface.co/upstage/SOLAR-10.7B-v1.0) | 66.04 | 61.95 | 84.60 | 65.48 | 45.04 | 83.66 | 55.50 |
128
+ | [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 60.97 | 59.98 | 83.31 | 64.16 | 42.15 | 78.37 | 37.83 |
129
 
 
 
130
  - Follow up as [Eleuther/LM-Harness](https://github.com/EleutherAI/lm-evaluation-harness)
131
+ ```yaml
132
  (will update)
133
  ```
134