leaderboard-pr-bot commited on
Commit
b8912a2
1 Parent(s): 3eb0cff

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +138 -22
README.md CHANGED
@@ -1,30 +1,132 @@
1
  ---
2
  license: mit
3
  widget:
4
- - text: >
5
- <|system|>
6
-
7
- You are a helpful assistant</s>
8
-
9
- <|user|>
10
-
11
- Tell me about yourself, what is your name?.</s>
12
-
13
- <|assistant|>
14
 
 
 
 
 
 
 
 
 
 
15
  widget2:
16
- - text: >
17
- <|system|>
18
-
19
- You are a helpful assistant</s>
20
-
21
- <|user|>
22
-
23
- How about another amazing adventure on The Cinder Show!</s>
24
-
25
- <|assistant|>
26
-
27
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ---
29
  Model Card for Cinder
30
  Model Name: Cinder
@@ -74,3 +176,17 @@ I encourage collaboration and contributions to expand Cinder's educational and c
74
  If you have any suggestions or requests please leave them in the newly created discord channel.
75
  https://discord.gg/5ebjDrnZ
76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  widget:
4
+ - text: '<|system|>
 
 
 
 
 
 
 
 
 
5
 
6
+ You are a helpful assistant</s>
7
+
8
+ <|user|>
9
+
10
+ Tell me about yourself, what is your name?.</s>
11
+
12
+ <|assistant|>
13
+
14
+ '
15
  widget2:
16
+ - text: '<|system|>
17
+
18
+ You are a helpful assistant</s>
19
+
20
+ <|user|>
21
+
22
+ How about another amazing adventure on The Cinder Show!</s>
23
+
24
+ <|assistant|>
25
+
26
+ '
27
+ model-index:
28
+ - name: TinyLlama-3T-Cinder-v1.1
29
+ results:
30
+ - task:
31
+ type: text-generation
32
+ name: Text Generation
33
+ dataset:
34
+ name: AI2 Reasoning Challenge (25-Shot)
35
+ type: ai2_arc
36
+ config: ARC-Challenge
37
+ split: test
38
+ args:
39
+ num_few_shot: 25
40
+ metrics:
41
+ - type: acc_norm
42
+ value: 34.04
43
+ name: normalized accuracy
44
+ source:
45
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/TinyLlama-3T-Cinder-v1.1
46
+ name: Open LLM Leaderboard
47
+ - task:
48
+ type: text-generation
49
+ name: Text Generation
50
+ dataset:
51
+ name: HellaSwag (10-Shot)
52
+ type: hellaswag
53
+ split: validation
54
+ args:
55
+ num_few_shot: 10
56
+ metrics:
57
+ - type: acc_norm
58
+ value: 50.4
59
+ name: normalized accuracy
60
+ source:
61
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/TinyLlama-3T-Cinder-v1.1
62
+ name: Open LLM Leaderboard
63
+ - task:
64
+ type: text-generation
65
+ name: Text Generation
66
+ dataset:
67
+ name: MMLU (5-Shot)
68
+ type: cais/mmlu
69
+ config: all
70
+ split: test
71
+ args:
72
+ num_few_shot: 5
73
+ metrics:
74
+ - type: acc
75
+ value: 25.75
76
+ name: accuracy
77
+ source:
78
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/TinyLlama-3T-Cinder-v1.1
79
+ name: Open LLM Leaderboard
80
+ - task:
81
+ type: text-generation
82
+ name: Text Generation
83
+ dataset:
84
+ name: TruthfulQA (0-shot)
85
+ type: truthful_qa
86
+ config: multiple_choice
87
+ split: validation
88
+ args:
89
+ num_few_shot: 0
90
+ metrics:
91
+ - type: mc2
92
+ value: 37.57
93
+ source:
94
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/TinyLlama-3T-Cinder-v1.1
95
+ name: Open LLM Leaderboard
96
+ - task:
97
+ type: text-generation
98
+ name: Text Generation
99
+ dataset:
100
+ name: Winogrande (5-shot)
101
+ type: winogrande
102
+ config: winogrande_xl
103
+ split: validation
104
+ args:
105
+ num_few_shot: 5
106
+ metrics:
107
+ - type: acc
108
+ value: 56.43
109
+ name: accuracy
110
+ source:
111
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/TinyLlama-3T-Cinder-v1.1
112
+ name: Open LLM Leaderboard
113
+ - task:
114
+ type: text-generation
115
+ name: Text Generation
116
+ dataset:
117
+ name: GSM8k (5-shot)
118
+ type: gsm8k
119
+ config: main
120
+ split: test
121
+ args:
122
+ num_few_shot: 5
123
+ metrics:
124
+ - type: acc
125
+ value: 0.0
126
+ name: accuracy
127
+ source:
128
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Josephgflowers/TinyLlama-3T-Cinder-v1.1
129
+ name: Open LLM Leaderboard
130
  ---
131
  Model Card for Cinder
132
  Model Name: Cinder
 
176
  If you have any suggestions or requests please leave them in the newly created discord channel.
177
  https://discord.gg/5ebjDrnZ
178
 
179
+
180
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
181
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Josephgflowers__TinyLlama-3T-Cinder-v1.1)
182
+
183
+ | Metric |Value|
184
+ |---------------------------------|----:|
185
+ |Avg. |34.03|
186
+ |AI2 Reasoning Challenge (25-Shot)|34.04|
187
+ |HellaSwag (10-Shot) |50.40|
188
+ |MMLU (5-Shot) |25.75|
189
+ |TruthfulQA (0-shot) |37.57|
190
+ |Winogrande (5-shot) |56.43|
191
+ |GSM8k (5-shot) | 0.00|
192
+