Commit
7c1655c
1 Parent(s): 95366b9

Adding Evaluation Results (#14)

Browse files

- Adding Evaluation Results (f4a4f638e512b6e77b092a1dfa7cd2584f542954)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +106 -1
README.md CHANGED
@@ -16,7 +16,6 @@ tags:
16
  base_model: meta-llama/Meta-Llama-3-70B-Instruct
17
  datasets:
18
  - Intel/orca_dpo_pairs
19
- model_name: Llama-3-70B-Instruct-DPO-v0.2
20
  pipeline_tag: text-generation
21
  license_name: llama3
22
  license_link: LICENSE
@@ -126,6 +125,98 @@ model-index:
126
  source:
127
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.2
128
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
129
  ---
130
 
131
  <img src="./llama-3-merges.webp" alt="Llama-3 DPO Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
@@ -234,3 +325,17 @@ outputs = pipeline(
234
  )
235
  print(outputs[0]["generated_text"][len(prompt):])
236
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  base_model: meta-llama/Meta-Llama-3-70B-Instruct
17
  datasets:
18
  - Intel/orca_dpo_pairs
 
19
  pipeline_tag: text-generation
20
  license_name: llama3
21
  license_link: LICENSE
 
125
  source:
126
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.2
127
  name: Open LLM Leaderboard
128
+ - task:
129
+ type: text-generation
130
+ name: Text Generation
131
+ dataset:
132
+ name: IFEval (0-Shot)
133
+ type: HuggingFaceH4/ifeval
134
+ args:
135
+ num_few_shot: 0
136
+ metrics:
137
+ - type: inst_level_strict_acc and prompt_level_strict_acc
138
+ value: 82.08
139
+ name: strict accuracy
140
+ source:
141
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.2
142
+ name: Open LLM Leaderboard
143
+ - task:
144
+ type: text-generation
145
+ name: Text Generation
146
+ dataset:
147
+ name: BBH (3-Shot)
148
+ type: BBH
149
+ args:
150
+ num_few_shot: 3
151
+ metrics:
152
+ - type: acc_norm
153
+ value: 48.57
154
+ name: normalized accuracy
155
+ source:
156
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.2
157
+ name: Open LLM Leaderboard
158
+ - task:
159
+ type: text-generation
160
+ name: Text Generation
161
+ dataset:
162
+ name: MATH Lvl 5 (4-Shot)
163
+ type: hendrycks/competition_math
164
+ args:
165
+ num_few_shot: 4
166
+ metrics:
167
+ - type: exact_match
168
+ value: 22.96
169
+ name: exact match
170
+ source:
171
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.2
172
+ name: Open LLM Leaderboard
173
+ - task:
174
+ type: text-generation
175
+ name: Text Generation
176
+ dataset:
177
+ name: GPQA (0-shot)
178
+ type: Idavidrein/gpqa
179
+ args:
180
+ num_few_shot: 0
181
+ metrics:
182
+ - type: acc_norm
183
+ value: 12.19
184
+ name: acc_norm
185
+ source:
186
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.2
187
+ name: Open LLM Leaderboard
188
+ - task:
189
+ type: text-generation
190
+ name: Text Generation
191
+ dataset:
192
+ name: MuSR (0-shot)
193
+ type: TAUR-Lab/MuSR
194
+ args:
195
+ num_few_shot: 0
196
+ metrics:
197
+ - type: acc_norm
198
+ value: 15.3
199
+ name: acc_norm
200
+ source:
201
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.2
202
+ name: Open LLM Leaderboard
203
+ - task:
204
+ type: text-generation
205
+ name: Text Generation
206
+ dataset:
207
+ name: MMLU-PRO (5-shot)
208
+ type: TIGER-Lab/MMLU-Pro
209
+ config: main
210
+ split: test
211
+ args:
212
+ num_few_shot: 5
213
+ metrics:
214
+ - type: acc
215
+ value: 46.74
216
+ name: accuracy
217
+ source:
218
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.2
219
+ name: Open LLM Leaderboard
220
  ---
221
 
222
  <img src="./llama-3-merges.webp" alt="Llama-3 DPO Logo" width="500" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
 
325
  )
326
  print(outputs[0]["generated_text"][len(prompt):])
327
  ```
328
+
329
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
330
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_MaziyarPanahi__Llama-3-70B-Instruct-DPO-v0.2)
331
+
332
+ | Metric |Value|
333
+ |-------------------|----:|
334
+ |Avg. |37.98|
335
+ |IFEval (0-Shot) |82.08|
336
+ |BBH (3-Shot) |48.57|
337
+ |MATH Lvl 5 (4-Shot)|22.96|
338
+ |GPQA (0-shot) |12.19|
339
+ |MuSR (0-shot) |15.30|
340
+ |MMLU-PRO (5-shot) |46.74|
341
+