leaderboard-pr-bot commited on
Commit
6a849b6
1 Parent(s): 85629ec

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +106 -0
README.md CHANGED
@@ -115,6 +115,98 @@ model-index:
115
  source:
116
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/Cinder-Phi-2-V1-F16-gguf
117
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
  ---
119
 
120
  I am really enjoying this version of Cinder. More information coming. Training data similar to openhermes2.5 with some added math, STEM, and reasoning mostly from OpenOrca. As well as Cinder character specific data, a mix of RAG generated Q and A of world knowledge, STEM topics, and Cinder Character data. I suplimented the Cinder character with an abreviated Samantha dataset edited for Cinder and removed a lot of the negative responses.
@@ -140,3 +232,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
140
  |Winogrande (5-shot) |74.66|
141
  |GSM8k (5-shot) |47.23|
142
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
115
  source:
116
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/Cinder-Phi-2-V1-F16-gguf
117
  name: Open LLM Leaderboard
118
+ - task:
119
+ type: text-generation
120
+ name: Text Generation
121
+ dataset:
122
+ name: IFEval (0-Shot)
123
+ type: HuggingFaceH4/ifeval
124
+ args:
125
+ num_few_shot: 0
126
+ metrics:
127
+ - type: inst_level_strict_acc and prompt_level_strict_acc
128
+ value: 23.57
129
+ name: strict accuracy
130
+ source:
131
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Josephgflowers/Cinder-Phi-2-V1-F16-gguf
132
+ name: Open LLM Leaderboard
133
+ - task:
134
+ type: text-generation
135
+ name: Text Generation
136
+ dataset:
137
+ name: BBH (3-Shot)
138
+ type: BBH
139
+ args:
140
+ num_few_shot: 3
141
+ metrics:
142
+ - type: acc_norm
143
+ value: 22.45
144
+ name: normalized accuracy
145
+ source:
146
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Josephgflowers/Cinder-Phi-2-V1-F16-gguf
147
+ name: Open LLM Leaderboard
148
+ - task:
149
+ type: text-generation
150
+ name: Text Generation
151
+ dataset:
152
+ name: MATH Lvl 5 (4-Shot)
153
+ type: hendrycks/competition_math
154
+ args:
155
+ num_few_shot: 4
156
+ metrics:
157
+ - type: exact_match
158
+ value: 0.0
159
+ name: exact match
160
+ source:
161
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Josephgflowers/Cinder-Phi-2-V1-F16-gguf
162
+ name: Open LLM Leaderboard
163
+ - task:
164
+ type: text-generation
165
+ name: Text Generation
166
+ dataset:
167
+ name: GPQA (0-shot)
168
+ type: Idavidrein/gpqa
169
+ args:
170
+ num_few_shot: 0
171
+ metrics:
172
+ - type: acc_norm
173
+ value: 4.25
174
+ name: acc_norm
175
+ source:
176
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Josephgflowers/Cinder-Phi-2-V1-F16-gguf
177
+ name: Open LLM Leaderboard
178
+ - task:
179
+ type: text-generation
180
+ name: Text Generation
181
+ dataset:
182
+ name: MuSR (0-shot)
183
+ type: TAUR-Lab/MuSR
184
+ args:
185
+ num_few_shot: 0
186
+ metrics:
187
+ - type: acc_norm
188
+ value: 1.97
189
+ name: acc_norm
190
+ source:
191
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Josephgflowers/Cinder-Phi-2-V1-F16-gguf
192
+ name: Open LLM Leaderboard
193
+ - task:
194
+ type: text-generation
195
+ name: Text Generation
196
+ dataset:
197
+ name: MMLU-PRO (5-shot)
198
+ type: TIGER-Lab/MMLU-Pro
199
+ config: main
200
+ split: test
201
+ args:
202
+ num_few_shot: 5
203
+ metrics:
204
+ - type: acc
205
+ value: 12.9
206
+ name: accuracy
207
+ source:
208
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Josephgflowers/Cinder-Phi-2-V1-F16-gguf
209
+ name: Open LLM Leaderboard
210
  ---
211
 
212
  I am really enjoying this version of Cinder. More information coming. Training data similar to openhermes2.5 with some added math, STEM, and reasoning mostly from OpenOrca. As well as Cinder character specific data, a mix of RAG generated Q and A of world knowledge, STEM topics, and Cinder Character data. I suplimented the Cinder character with an abreviated Samantha dataset edited for Cinder and removed a lot of the negative responses.
 
232
  |Winogrande (5-shot) |74.66|
233
  |GSM8k (5-shot) |47.23|
234
 
235
+
236
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
237
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Josephgflowers__Cinder-Phi-2-V1-F16-gguf)
238
+
239
+ | Metric |Value|
240
+ |-------------------|----:|
241
+ |Avg. |10.86|
242
+ |IFEval (0-Shot) |23.57|
243
+ |BBH (3-Shot) |22.45|
244
+ |MATH Lvl 5 (4-Shot)| 0.00|
245
+ |GPQA (0-shot) | 4.25|
246
+ |MuSR (0-shot) | 1.97|
247
+ |MMLU-PRO (5-shot) |12.90|
248
+