Files changed (1) hide show
  1. README.md +106 -0
README.md CHANGED
@@ -117,6 +117,98 @@ model-index:
117
  source:
118
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/Daredevil-8B
119
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
120
  ---
121
 
122
  # Daredevil-8B
@@ -248,3 +340,17 @@ pipeline = transformers.pipeline(
248
  outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
249
  print(outputs[0]["generated_text"])
250
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
  source:
118
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/Daredevil-8B
119
  name: Open LLM Leaderboard
120
+ - task:
121
+ type: text-generation
122
+ name: Text Generation
123
+ dataset:
124
+ name: IFEval (0-Shot)
125
+ type: HuggingFaceH4/ifeval
126
+ args:
127
+ num_few_shot: 0
128
+ metrics:
129
+ - type: inst_level_strict_acc and prompt_level_strict_acc
130
+ value: 45.48
131
+ name: strict accuracy
132
+ source:
133
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/Daredevil-8B
134
+ name: Open LLM Leaderboard
135
+ - task:
136
+ type: text-generation
137
+ name: Text Generation
138
+ dataset:
139
+ name: BBH (3-Shot)
140
+ type: BBH
141
+ args:
142
+ num_few_shot: 3
143
+ metrics:
144
+ - type: acc_norm
145
+ value: 31.63
146
+ name: normalized accuracy
147
+ source:
148
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/Daredevil-8B
149
+ name: Open LLM Leaderboard
150
+ - task:
151
+ type: text-generation
152
+ name: Text Generation
153
+ dataset:
154
+ name: MATH Lvl 5 (4-Shot)
155
+ type: hendrycks/competition_math
156
+ args:
157
+ num_few_shot: 4
158
+ metrics:
159
+ - type: exact_match
160
+ value: 8.99
161
+ name: exact match
162
+ source:
163
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/Daredevil-8B
164
+ name: Open LLM Leaderboard
165
+ - task:
166
+ type: text-generation
167
+ name: Text Generation
168
+ dataset:
169
+ name: GPQA (0-shot)
170
+ type: Idavidrein/gpqa
171
+ args:
172
+ num_few_shot: 0
173
+ metrics:
174
+ - type: acc_norm
175
+ value: 7.72
176
+ name: acc_norm
177
+ source:
178
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/Daredevil-8B
179
+ name: Open LLM Leaderboard
180
+ - task:
181
+ type: text-generation
182
+ name: Text Generation
183
+ dataset:
184
+ name: MuSR (0-shot)
185
+ type: TAUR-Lab/MuSR
186
+ args:
187
+ num_few_shot: 0
188
+ metrics:
189
+ - type: acc_norm
190
+ value: 7.53
191
+ name: acc_norm
192
+ source:
193
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/Daredevil-8B
194
+ name: Open LLM Leaderboard
195
+ - task:
196
+ type: text-generation
197
+ name: Text Generation
198
+ dataset:
199
+ name: MMLU-PRO (5-shot)
200
+ type: TIGER-Lab/MMLU-Pro
201
+ config: main
202
+ split: test
203
+ args:
204
+ num_few_shot: 5
205
+ metrics:
206
+ - type: acc
207
+ value: 31.45
208
+ name: accuracy
209
+ source:
210
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=mlabonne/Daredevil-8B
211
+ name: Open LLM Leaderboard
212
  ---
213
 
214
  # Daredevil-8B
 
340
  outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
341
  print(outputs[0]["generated_text"])
342
  ```
343
+
344
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
345
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_mlabonne__Daredevil-8B)
346
+
347
+ | Metric |Value|
348
+ |-------------------|----:|
349
+ |Avg. |22.13|
350
+ |IFEval (0-Shot) |45.48|
351
+ |BBH (3-Shot) |31.63|
352
+ |MATH Lvl 5 (4-Shot)| 8.99|
353
+ |GPQA (0-shot) | 7.72|
354
+ |MuSR (0-shot) | 7.53|
355
+ |MMLU-PRO (5-shot) |31.45|
356
+