Weyaxi leaderboard-pr-bot commited on
Commit
4802d78
1 Parent(s): 7eecd98

Adding Evaluation Results (#6)

Browse files

- Adding Evaluation Results (43d6b134e9808aa9b7bfbeec1e2a69b1963dd6d4)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +114 -15
README.md CHANGED
@@ -1,4 +1,6 @@
1
  ---
 
 
2
  license: other
3
  tags:
4
  - axolotl
@@ -64,8 +66,7 @@ model-index:
64
  value: 64.68
65
  name: normalized accuracy
66
  source:
67
- url: >-
68
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
69
  name: Open LLM Leaderboard
70
  - task:
71
  type: text-generation
@@ -81,8 +82,7 @@ model-index:
81
  value: 83.75
82
  name: normalized accuracy
83
  source:
84
- url: >-
85
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
86
  name: Open LLM Leaderboard
87
  - task:
88
  type: text-generation
@@ -99,8 +99,7 @@ model-index:
99
  value: 62.31
100
  name: accuracy
101
  source:
102
- url: >-
103
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
104
  name: Open LLM Leaderboard
105
  - task:
106
  type: text-generation
@@ -116,8 +115,7 @@ model-index:
116
  - type: mc2
117
  value: 55.15
118
  source:
119
- url: >-
120
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
121
  name: Open LLM Leaderboard
122
  - task:
123
  type: text-generation
@@ -134,8 +132,7 @@ model-index:
134
  value: 76.24
135
  name: accuracy
136
  source:
137
- url: >-
138
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
139
  name: Open LLM Leaderboard
140
  - task:
141
  type: text-generation
@@ -152,11 +149,100 @@ model-index:
152
  value: 57.62
153
  name: accuracy
154
  source:
155
- url: >-
156
- https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
  name: Open LLM Leaderboard
158
- language:
159
- - en
160
  ---
161
 
162
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/U0zyXVGj-O8a7KP3BvPue.png)
@@ -373,4 +459,17 @@ Thanks to all open source AI community.
373
 
374
  If you would like to support me:
375
 
376
- [☕ Buy Me a Coffee](https://www.buymeacoffee.com/weyaxi)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: other
5
  tags:
6
  - axolotl
 
66
  value: 64.68
67
  name: normalized accuracy
68
  source:
69
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
 
70
  name: Open LLM Leaderboard
71
  - task:
72
  type: text-generation
 
82
  value: 83.75
83
  name: normalized accuracy
84
  source:
85
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
 
86
  name: Open LLM Leaderboard
87
  - task:
88
  type: text-generation
 
99
  value: 62.31
100
  name: accuracy
101
  source:
102
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
 
103
  name: Open LLM Leaderboard
104
  - task:
105
  type: text-generation
 
115
  - type: mc2
116
  value: 55.15
117
  source:
118
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
 
119
  name: Open LLM Leaderboard
120
  - task:
121
  type: text-generation
 
132
  value: 76.24
133
  name: accuracy
134
  source:
135
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
 
136
  name: Open LLM Leaderboard
137
  - task:
138
  type: text-generation
 
149
  value: 57.62
150
  name: accuracy
151
  source:
152
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
153
+ name: Open LLM Leaderboard
154
+ - task:
155
+ type: text-generation
156
+ name: Text Generation
157
+ dataset:
158
+ name: IFEval (0-Shot)
159
+ type: HuggingFaceH4/ifeval
160
+ args:
161
+ num_few_shot: 0
162
+ metrics:
163
+ - type: inst_level_strict_acc and prompt_level_strict_acc
164
+ value: 47.08
165
+ name: strict accuracy
166
+ source:
167
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
168
+ name: Open LLM Leaderboard
169
+ - task:
170
+ type: text-generation
171
+ name: Text Generation
172
+ dataset:
173
+ name: BBH (3-Shot)
174
+ type: BBH
175
+ args:
176
+ num_few_shot: 3
177
+ metrics:
178
+ - type: acc_norm
179
+ value: 14.3
180
+ name: normalized accuracy
181
+ source:
182
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
183
+ name: Open LLM Leaderboard
184
+ - task:
185
+ type: text-generation
186
+ name: Text Generation
187
+ dataset:
188
+ name: MATH Lvl 5 (4-Shot)
189
+ type: hendrycks/competition_math
190
+ args:
191
+ num_few_shot: 4
192
+ metrics:
193
+ - type: exact_match
194
+ value: 1.74
195
+ name: exact match
196
+ source:
197
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
198
+ name: Open LLM Leaderboard
199
+ - task:
200
+ type: text-generation
201
+ name: Text Generation
202
+ dataset:
203
+ name: GPQA (0-shot)
204
+ type: Idavidrein/gpqa
205
+ args:
206
+ num_few_shot: 0
207
+ metrics:
208
+ - type: acc_norm
209
+ value: 4.25
210
+ name: acc_norm
211
+ source:
212
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
213
+ name: Open LLM Leaderboard
214
+ - task:
215
+ type: text-generation
216
+ name: Text Generation
217
+ dataset:
218
+ name: MuSR (0-shot)
219
+ type: TAUR-Lab/MuSR
220
+ args:
221
+ num_few_shot: 0
222
+ metrics:
223
+ - type: acc_norm
224
+ value: 19.02
225
+ name: acc_norm
226
+ source:
227
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
228
+ name: Open LLM Leaderboard
229
+ - task:
230
+ type: text-generation
231
+ name: Text Generation
232
+ dataset:
233
+ name: MMLU-PRO (5-shot)
234
+ type: TIGER-Lab/MMLU-Pro
235
+ config: main
236
+ split: test
237
+ args:
238
+ num_few_shot: 5
239
+ metrics:
240
+ - type: acc
241
+ value: 13.99
242
+ name: accuracy
243
+ source:
244
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
245
  name: Open LLM Leaderboard
 
 
246
  ---
247
 
248
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6468ce47e134d050a58aa89c/U0zyXVGj-O8a7KP3BvPue.png)
 
459
 
460
  If you would like to support me:
461
 
462
+ [☕ Buy Me a Coffee](https://www.buymeacoffee.com/weyaxi)
463
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
464
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__Einstein-v4-7B)
465
+
466
+ | Metric |Value|
467
+ |-------------------|----:|
468
+ |Avg. |16.73|
469
+ |IFEval (0-Shot) |47.08|
470
+ |BBH (3-Shot) |14.30|
471
+ |MATH Lvl 5 (4-Shot)| 1.74|
472
+ |GPQA (0-shot) | 4.25|
473
+ |MuSR (0-shot) |19.02|
474
+ |MMLU-PRO (5-shot) |13.99|
475
+