LeroyDyer leaderboard-pr-bot commited on
Commit
3154a54
1 Parent(s): 5601fc5

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (81ca2a38367419328ac4a3ebe39c37933d8c5740)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +117 -0
README.md CHANGED
@@ -67,6 +67,109 @@ Variant:
67
  - LeroyDyer/TruthfulQA_LLM
68
  - LeroyDyer/HellaSwag_LLM
69
  - LeroyDyer/Mixtral_AI_DeepMedicalMind
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
  ---
71
 
72
 
@@ -163,3 +266,17 @@ as we are in this development period we are focused on BRAIN cureently .......
163
  This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
164
 
165
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
  - LeroyDyer/TruthfulQA_LLM
68
  - LeroyDyer/HellaSwag_LLM
69
  - LeroyDyer/Mixtral_AI_DeepMedicalMind
70
+ model-index:
71
+ - name: Mixtral_AI_CyberTron_DeepMind_III_UFT
72
+ results:
73
+ - task:
74
+ type: text-generation
75
+ name: Text Generation
76
+ dataset:
77
+ name: AI2 Reasoning Challenge (25-Shot)
78
+ type: ai2_arc
79
+ config: ARC-Challenge
80
+ split: test
81
+ args:
82
+ num_few_shot: 25
83
+ metrics:
84
+ - type: acc_norm
85
+ value: 61.86
86
+ name: normalized accuracy
87
+ source:
88
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=LeroyDyer/Mixtral_AI_CyberTron_DeepMind_III_UFT
89
+ name: Open LLM Leaderboard
90
+ - task:
91
+ type: text-generation
92
+ name: Text Generation
93
+ dataset:
94
+ name: HellaSwag (10-Shot)
95
+ type: hellaswag
96
+ split: validation
97
+ args:
98
+ num_few_shot: 10
99
+ metrics:
100
+ - type: acc_norm
101
+ value: 83.15
102
+ name: normalized accuracy
103
+ source:
104
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=LeroyDyer/Mixtral_AI_CyberTron_DeepMind_III_UFT
105
+ name: Open LLM Leaderboard
106
+ - task:
107
+ type: text-generation
108
+ name: Text Generation
109
+ dataset:
110
+ name: MMLU (5-Shot)
111
+ type: cais/mmlu
112
+ config: all
113
+ split: test
114
+ args:
115
+ num_few_shot: 5
116
+ metrics:
117
+ - type: acc
118
+ value: 61.95
119
+ name: accuracy
120
+ source:
121
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=LeroyDyer/Mixtral_AI_CyberTron_DeepMind_III_UFT
122
+ name: Open LLM Leaderboard
123
+ - task:
124
+ type: text-generation
125
+ name: Text Generation
126
+ dataset:
127
+ name: TruthfulQA (0-shot)
128
+ type: truthful_qa
129
+ config: multiple_choice
130
+ split: validation
131
+ args:
132
+ num_few_shot: 0
133
+ metrics:
134
+ - type: mc2
135
+ value: 49.41
136
+ source:
137
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=LeroyDyer/Mixtral_AI_CyberTron_DeepMind_III_UFT
138
+ name: Open LLM Leaderboard
139
+ - task:
140
+ type: text-generation
141
+ name: Text Generation
142
+ dataset:
143
+ name: Winogrande (5-shot)
144
+ type: winogrande
145
+ config: winogrande_xl
146
+ split: validation
147
+ args:
148
+ num_few_shot: 5
149
+ metrics:
150
+ - type: acc
151
+ value: 77.98
152
+ name: accuracy
153
+ source:
154
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=LeroyDyer/Mixtral_AI_CyberTron_DeepMind_III_UFT
155
+ name: Open LLM Leaderboard
156
+ - task:
157
+ type: text-generation
158
+ name: Text Generation
159
+ dataset:
160
+ name: GSM8k (5-shot)
161
+ type: gsm8k
162
+ config: main
163
+ split: test
164
+ args:
165
+ num_few_shot: 5
166
+ metrics:
167
+ - type: acc
168
+ value: 51.86
169
+ name: accuracy
170
+ source:
171
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=LeroyDyer/Mixtral_AI_CyberTron_DeepMind_III_UFT
172
+ name: Open LLM Leaderboard
173
  ---
174
 
175
 
 
266
  This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
267
 
268
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
269
+
270
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
271
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_LeroyDyer__Mixtral_AI_CyberTron_DeepMind_III_UFT)
272
+
273
+ | Metric |Value|
274
+ |---------------------------------|----:|
275
+ |Avg. |64.37|
276
+ |AI2 Reasoning Challenge (25-Shot)|61.86|
277
+ |HellaSwag (10-Shot) |83.15|
278
+ |MMLU (5-Shot) |61.95|
279
+ |TruthfulQA (0-shot) |49.41|
280
+ |Winogrande (5-shot) |77.98|
281
+ |GSM8k (5-shot) |51.86|
282
+