leaderboard-pr-bot commited on
Commit
7419763
1 Parent(s): 189b42b

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +106 -0
README.md CHANGED
@@ -111,6 +111,98 @@ model-index:
111
  source:
112
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=adamo1139/Yi-34B-200K-AEZAKMI-v2
113
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
  ---
115
 
116
  ## Model description
@@ -232,3 +324,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
232
  |Winogrande (5-shot) |81.61|
233
  |GSM8k (5-shot) |58.91|
234
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
111
  source:
112
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=adamo1139/Yi-34B-200K-AEZAKMI-v2
113
  name: Open LLM Leaderboard
114
+ - task:
115
+ type: text-generation
116
+ name: Text Generation
117
+ dataset:
118
+ name: IFEval (0-Shot)
119
+ type: HuggingFaceH4/ifeval
120
+ args:
121
+ num_few_shot: 0
122
+ metrics:
123
+ - type: inst_level_strict_acc and prompt_level_strict_acc
124
+ value: 45.55
125
+ name: strict accuracy
126
+ source:
127
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=adamo1139/Yi-34B-200K-AEZAKMI-v2
128
+ name: Open LLM Leaderboard
129
+ - task:
130
+ type: text-generation
131
+ name: Text Generation
132
+ dataset:
133
+ name: BBH (3-Shot)
134
+ type: BBH
135
+ args:
136
+ num_few_shot: 3
137
+ metrics:
138
+ - type: acc_norm
139
+ value: 35.28
140
+ name: normalized accuracy
141
+ source:
142
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=adamo1139/Yi-34B-200K-AEZAKMI-v2
143
+ name: Open LLM Leaderboard
144
+ - task:
145
+ type: text-generation
146
+ name: Text Generation
147
+ dataset:
148
+ name: MATH Lvl 5 (4-Shot)
149
+ type: hendrycks/competition_math
150
+ args:
151
+ num_few_shot: 4
152
+ metrics:
153
+ - type: exact_match
154
+ value: 4.83
155
+ name: exact match
156
+ source:
157
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=adamo1139/Yi-34B-200K-AEZAKMI-v2
158
+ name: Open LLM Leaderboard
159
+ - task:
160
+ type: text-generation
161
+ name: Text Generation
162
+ dataset:
163
+ name: GPQA (0-shot)
164
+ type: Idavidrein/gpqa
165
+ args:
166
+ num_few_shot: 0
167
+ metrics:
168
+ - type: acc_norm
169
+ value: 10.96
170
+ name: acc_norm
171
+ source:
172
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=adamo1139/Yi-34B-200K-AEZAKMI-v2
173
+ name: Open LLM Leaderboard
174
+ - task:
175
+ type: text-generation
176
+ name: Text Generation
177
+ dataset:
178
+ name: MuSR (0-shot)
179
+ type: TAUR-Lab/MuSR
180
+ args:
181
+ num_few_shot: 0
182
+ metrics:
183
+ - type: acc_norm
184
+ value: 6.48
185
+ name: acc_norm
186
+ source:
187
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=adamo1139/Yi-34B-200K-AEZAKMI-v2
188
+ name: Open LLM Leaderboard
189
+ - task:
190
+ type: text-generation
191
+ name: Text Generation
192
+ dataset:
193
+ name: MMLU-PRO (5-shot)
194
+ type: TIGER-Lab/MMLU-Pro
195
+ config: main
196
+ split: test
197
+ args:
198
+ num_few_shot: 5
199
+ metrics:
200
+ - type: acc
201
+ value: 39.03
202
+ name: accuracy
203
+ source:
204
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=adamo1139/Yi-34B-200K-AEZAKMI-v2
205
+ name: Open LLM Leaderboard
206
  ---
207
 
208
  ## Model description
 
324
  |Winogrande (5-shot) |81.61|
325
  |GSM8k (5-shot) |58.91|
326
 
327
+
328
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
329
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_adamo1139__Yi-34B-200K-AEZAKMI-v2)
330
+
331
+ | Metric |Value|
332
+ |-------------------|----:|
333
+ |Avg. |23.69|
334
+ |IFEval (0-Shot) |45.55|
335
+ |BBH (3-Shot) |35.28|
336
+ |MATH Lvl 5 (4-Shot)| 4.83|
337
+ |GPQA (0-shot) |10.96|
338
+ |MuSR (0-shot) | 6.48|
339
+ |MMLU-PRO (5-shot) |39.03|
340
+