leaderboard-pr-bot commited on
Commit
8d9d8d9
1 Parent(s): f3c4929

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +118 -2
README.md CHANGED
@@ -1,6 +1,5 @@
1
  ---
2
  license: other
3
- base_model: mistralai/Mistral-7B-v0.1
4
  tags:
5
  - axolotl
6
  - generated_from_trainer
@@ -15,6 +14,7 @@ tags:
15
  - chemistry
16
  - biology
17
  - math
 
18
  datasets:
19
  - allenai/ai2_arc
20
  - camel-ai/physics
@@ -46,6 +46,109 @@ datasets:
46
  - Open-Orca/SlimOrca
47
  - migtissera/Synthia-v1.3
48
  - TIGER-Lab/ScienceEval
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  ---
50
  # 🔬 Einstein-v4-7B
51
 
@@ -204,4 +307,17 @@ Thanks to all open source AI community.
204
 
205
  If you would like to support me:
206
 
207
- [☕ Buy Me a Coffee](https://www.buymeacoffee.com/weyaxi)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
 
3
  tags:
4
  - axolotl
5
  - generated_from_trainer
 
14
  - chemistry
15
  - biology
16
  - math
17
+ base_model: mistralai/Mistral-7B-v0.1
18
  datasets:
19
  - allenai/ai2_arc
20
  - camel-ai/physics
 
46
  - Open-Orca/SlimOrca
47
  - migtissera/Synthia-v1.3
48
  - TIGER-Lab/ScienceEval
49
+ model-index:
50
+ - name: Einstein-v4-7B
51
+ results:
52
+ - task:
53
+ type: text-generation
54
+ name: Text Generation
55
+ dataset:
56
+ name: AI2 Reasoning Challenge (25-Shot)
57
+ type: ai2_arc
58
+ config: ARC-Challenge
59
+ split: test
60
+ args:
61
+ num_few_shot: 25
62
+ metrics:
63
+ - type: acc_norm
64
+ value: 64.68
65
+ name: normalized accuracy
66
+ source:
67
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
68
+ name: Open LLM Leaderboard
69
+ - task:
70
+ type: text-generation
71
+ name: Text Generation
72
+ dataset:
73
+ name: HellaSwag (10-Shot)
74
+ type: hellaswag
75
+ split: validation
76
+ args:
77
+ num_few_shot: 10
78
+ metrics:
79
+ - type: acc_norm
80
+ value: 83.75
81
+ name: normalized accuracy
82
+ source:
83
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
84
+ name: Open LLM Leaderboard
85
+ - task:
86
+ type: text-generation
87
+ name: Text Generation
88
+ dataset:
89
+ name: MMLU (5-Shot)
90
+ type: cais/mmlu
91
+ config: all
92
+ split: test
93
+ args:
94
+ num_few_shot: 5
95
+ metrics:
96
+ - type: acc
97
+ value: 62.31
98
+ name: accuracy
99
+ source:
100
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
101
+ name: Open LLM Leaderboard
102
+ - task:
103
+ type: text-generation
104
+ name: Text Generation
105
+ dataset:
106
+ name: TruthfulQA (0-shot)
107
+ type: truthful_qa
108
+ config: multiple_choice
109
+ split: validation
110
+ args:
111
+ num_few_shot: 0
112
+ metrics:
113
+ - type: mc2
114
+ value: 55.15
115
+ source:
116
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
117
+ name: Open LLM Leaderboard
118
+ - task:
119
+ type: text-generation
120
+ name: Text Generation
121
+ dataset:
122
+ name: Winogrande (5-shot)
123
+ type: winogrande
124
+ config: winogrande_xl
125
+ split: validation
126
+ args:
127
+ num_few_shot: 5
128
+ metrics:
129
+ - type: acc
130
+ value: 76.24
131
+ name: accuracy
132
+ source:
133
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
134
+ name: Open LLM Leaderboard
135
+ - task:
136
+ type: text-generation
137
+ name: Text Generation
138
+ dataset:
139
+ name: GSM8k (5-shot)
140
+ type: gsm8k
141
+ config: main
142
+ split: test
143
+ args:
144
+ num_few_shot: 5
145
+ metrics:
146
+ - type: acc
147
+ value: 57.62
148
+ name: accuracy
149
+ source:
150
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Weyaxi/Einstein-v4-7B
151
+ name: Open LLM Leaderboard
152
  ---
153
  # 🔬 Einstein-v4-7B
154
 
 
307
 
308
  If you would like to support me:
309
 
310
+ [☕ Buy Me a Coffee](https://www.buymeacoffee.com/weyaxi)
311
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
312
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Weyaxi__Einstein-v4-7B)
313
+
314
+ | Metric |Value|
315
+ |---------------------------------|----:|
316
+ |Avg. |66.62|
317
+ |AI2 Reasoning Challenge (25-Shot)|64.68|
318
+ |HellaSwag (10-Shot) |83.75|
319
+ |MMLU (5-Shot) |62.31|
320
+ |TruthfulQA (0-shot) |55.15|
321
+ |Winogrande (5-shot) |76.24|
322
+ |GSM8k (5-shot) |57.62|
323
+