Commit
a39a670
1 Parent(s): 738e353

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (f28cf641514cfeebc3145d8209ee940e9f5a1802)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +232 -61
README.md CHANGED
@@ -1,76 +1,233 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
- - en
 
5
  tags:
6
- - text-generation
7
  base_model: BEE-spoke-data/smol_llama-101M-GQA
8
  datasets:
9
- - Open-Orca/SlimOrca-Dedup
10
- - VMware/open-instruct
11
- - LDJnr/Capybara
12
- - cognitivecomputations/ultrachat-uncensored
13
- - starfishmedical/webGPT_x_dolly
14
- - THUDM/webglm-qa
15
  widget:
16
- - text: |-
17
- <|im_start|>system
18
- You are a helpful assistant who gives creative responses.<|im_end|>
19
- <|im_start|>user
20
- Write the background story of a game about wizards and llamas in a sci-fi world.<|im_end|>
21
- <|im_start|>assistant
22
- - text: |-
23
- <|im_start|>system
24
- A friendly chat between a user and an assistant.<|im_end|>
25
- <|im_start|>user
26
- Got a question for you!<|im_end|>
27
- <|im_start|>assistant
28
- Sure! What's it?<|im_end|>
29
- <|im_start|>user
30
- I need to build a simple website. Where should I start learning about web development?<|im_end|>
31
- <|im_start|>assistant
32
- - text: |-
33
- <|im_start|>system
34
- You are a helpful assistant who provides concise answers to the user's questions.<|im_end|>
35
- <|im_start|>user
36
- How to become more healthy?<|im_end|>
37
- <|im_start|>assistant
38
- - text: |-
39
- <|im_start|>system
40
- You are a helpful assistant, who always answers with empathy.<|im_end|>
41
- <|im_start|>user
42
- List the pros and cons of social media.<|im_end|>
43
- <|im_start|>assistant
44
- - text: |-
45
- <|im_start|>system
46
- You are a helpful assistant, who always answers with empathy.<|im_end|>
47
- <|im_start|>user
48
- Hello!<|im_end|>
49
- <|im_start|>assistant
50
- Hi! How can I help you today?<|im_end|>
51
- <|im_start|>user
52
- Take a look at the info below.
53
-
54
- - The tape inside the VHS cassettes is very delicate and can be easily ruined, making them unplayable and unrepairable. The reason the tape deteriorates is that the magnetic charge needed for them to work is not permanent, and the magnetic particles end up losing their charge in a process known as remanence decay. These particles could also become demagnetised via being stored too close to a magnetic source.
55
- - One of the most significant issues with VHS tapes is that they have moving parts, meaning that there are more occasions when something can go wrong, damaging your footage or preventing it from playing back. The tape itself is a prominent cause of this, and tape slippage can occur. Tapes slippage can be caused when the tape loses its tension, or it has become warped. These problems can occur in storage due to high temperatures or frequent changes in humidity.
56
- - VHS tapes deteriorate over time from infrequent or overuse. Neglect means mold and dirt, while overuse can lead to scratches and technical difficulties. This is why old VHS tapes inevitably experience malfunctions after a long period of time. Usually anywhere between 10 to 25+ years.
57
- - Some VHS tapes like newer mini DVs and Digital 8 tapes can suffer from digital corruption, meaning that the footage becomes lost and cannot be recovered. These tapes were the steppingstone from VHS to the digital age when capturing footage straight to digital became the norm. Unfortunately,they are susceptible to digital corruption, which causes video pixilation and/or loss of audio.<|im_end|>
58
- <|im_start|>assistant
59
- Alright!<|im_end|>
60
- <|im_start|>user
61
- Now I'm going to write my question, and if the info above is useful, you can use them in your response.
62
- Ready?<|im_end|>
63
- <|im_start|>assistant
64
- Ready for your question!<|im_end|>
65
- <|im_start|>user
66
- Why do VHS tapes deteriorate over time?<|im_end|>
67
- <|im_start|>assistant
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
68
  inference:
69
  parameters:
70
  max_new_tokens: 250
71
  penalty_alpha: 0.5
72
  top_k: 4
73
  repetition_penalty: 1.105
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  ---
75
 
76
  # A Llama Chat Model of 101M Parameters
@@ -105,3 +262,17 @@ penalty_alpha: 0.5
105
  top_k: 4
106
  repetition_penalty: 1.105
107
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
+ - en
4
+ license: apache-2.0
5
  tags:
6
+ - text-generation
7
  base_model: BEE-spoke-data/smol_llama-101M-GQA
8
  datasets:
9
+ - Open-Orca/SlimOrca-Dedup
10
+ - VMware/open-instruct
11
+ - LDJnr/Capybara
12
+ - cognitivecomputations/ultrachat-uncensored
13
+ - starfishmedical/webGPT_x_dolly
14
+ - THUDM/webglm-qa
15
  widget:
16
+ - text: '<|im_start|>system
17
+
18
+ You are a helpful assistant who gives creative responses.<|im_end|>
19
+
20
+ <|im_start|>user
21
+
22
+ Write the background story of a game about wizards and llamas in a sci-fi world.<|im_end|>
23
+
24
+ <|im_start|>assistant'
25
+ - text: '<|im_start|>system
26
+
27
+ A friendly chat between a user and an assistant.<|im_end|>
28
+
29
+ <|im_start|>user
30
+
31
+ Got a question for you!<|im_end|>
32
+
33
+ <|im_start|>assistant
34
+
35
+ Sure! What''s it?<|im_end|>
36
+
37
+ <|im_start|>user
38
+
39
+ I need to build a simple website. Where should I start learning about web development?<|im_end|>
40
+
41
+ <|im_start|>assistant'
42
+ - text: '<|im_start|>system
43
+
44
+ You are a helpful assistant who provides concise answers to the user''s questions.<|im_end|>
45
+
46
+ <|im_start|>user
47
+
48
+ How to become more healthy?<|im_end|>
49
+
50
+ <|im_start|>assistant'
51
+ - text: '<|im_start|>system
52
+
53
+ You are a helpful assistant, who always answers with empathy.<|im_end|>
54
+
55
+ <|im_start|>user
56
+
57
+ List the pros and cons of social media.<|im_end|>
58
+
59
+ <|im_start|>assistant'
60
+ - text: '<|im_start|>system
61
+
62
+ You are a helpful assistant, who always answers with empathy.<|im_end|>
63
+
64
+ <|im_start|>user
65
+
66
+ Hello!<|im_end|>
67
+
68
+ <|im_start|>assistant
69
+
70
+ Hi! How can I help you today?<|im_end|>
71
+
72
+ <|im_start|>user
73
+
74
+ Take a look at the info below.
75
+
76
+
77
+ - The tape inside the VHS cassettes is very delicate and can be easily ruined,
78
+ making them unplayable and unrepairable. The reason the tape deteriorates is that
79
+ the magnetic charge needed for them to work is not permanent, and the magnetic
80
+ particles end up losing their charge in a process known as remanence decay. These
81
+ particles could also become demagnetised via being stored too close to a magnetic
82
+ source.
83
+
84
+ - One of the most significant issues with VHS tapes is that they have moving parts,
85
+ meaning that there are more occasions when something can go wrong, damaging your
86
+ footage or preventing it from playing back. The tape itself is a prominent cause
87
+ of this, and tape slippage can occur. Tapes slippage can be caused when the tape
88
+ loses its tension, or it has become warped. These problems can occur in storage
89
+ due to high temperatures or frequent changes in humidity.
90
+
91
+ - VHS tapes deteriorate over time from infrequent or overuse. Neglect means mold
92
+ and dirt, while overuse can lead to scratches and technical difficulties. This
93
+ is why old VHS tapes inevitably experience malfunctions after a long period of
94
+ time. Usually anywhere between 10 to 25+ years.
95
+
96
+ - Some VHS tapes like newer mini DVs and Digital 8 tapes can suffer from digital
97
+ corruption, meaning that the footage becomes lost and cannot be recovered. These
98
+ tapes were the steppingstone from VHS to the digital age when capturing footage
99
+ straight to digital became the norm. Unfortunately,they are susceptible to digital
100
+ corruption, which causes video pixilation and/or loss of audio.<|im_end|>
101
+
102
+ <|im_start|>assistant
103
+
104
+ Alright!<|im_end|>
105
+
106
+ <|im_start|>user
107
+
108
+ Now I''m going to write my question, and if the info above is useful, you can
109
+ use them in your response.
110
+
111
+ Ready?<|im_end|>
112
+
113
+ <|im_start|>assistant
114
+
115
+ Ready for your question!<|im_end|>
116
+
117
+ <|im_start|>user
118
+
119
+ Why do VHS tapes deteriorate over time?<|im_end|>
120
+
121
+ <|im_start|>assistant'
122
  inference:
123
  parameters:
124
  max_new_tokens: 250
125
  penalty_alpha: 0.5
126
  top_k: 4
127
  repetition_penalty: 1.105
128
+ model-index:
129
+ - name: Smol-Llama-101M-Chat-v1
130
+ results:
131
+ - task:
132
+ type: text-generation
133
+ name: Text Generation
134
+ dataset:
135
+ name: AI2 Reasoning Challenge (25-Shot)
136
+ type: ai2_arc
137
+ config: ARC-Challenge
138
+ split: test
139
+ args:
140
+ num_few_shot: 25
141
+ metrics:
142
+ - type: acc_norm
143
+ value: 22.87
144
+ name: normalized accuracy
145
+ source:
146
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Smol-Llama-101M-Chat-v1
147
+ name: Open LLM Leaderboard
148
+ - task:
149
+ type: text-generation
150
+ name: Text Generation
151
+ dataset:
152
+ name: HellaSwag (10-Shot)
153
+ type: hellaswag
154
+ split: validation
155
+ args:
156
+ num_few_shot: 10
157
+ metrics:
158
+ - type: acc_norm
159
+ value: 28.69
160
+ name: normalized accuracy
161
+ source:
162
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Smol-Llama-101M-Chat-v1
163
+ name: Open LLM Leaderboard
164
+ - task:
165
+ type: text-generation
166
+ name: Text Generation
167
+ dataset:
168
+ name: MMLU (5-Shot)
169
+ type: cais/mmlu
170
+ config: all
171
+ split: test
172
+ args:
173
+ num_few_shot: 5
174
+ metrics:
175
+ - type: acc
176
+ value: 24.93
177
+ name: accuracy
178
+ source:
179
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Smol-Llama-101M-Chat-v1
180
+ name: Open LLM Leaderboard
181
+ - task:
182
+ type: text-generation
183
+ name: Text Generation
184
+ dataset:
185
+ name: TruthfulQA (0-shot)
186
+ type: truthful_qa
187
+ config: multiple_choice
188
+ split: validation
189
+ args:
190
+ num_few_shot: 0
191
+ metrics:
192
+ - type: mc2
193
+ value: 45.76
194
+ source:
195
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Smol-Llama-101M-Chat-v1
196
+ name: Open LLM Leaderboard
197
+ - task:
198
+ type: text-generation
199
+ name: Text Generation
200
+ dataset:
201
+ name: Winogrande (5-shot)
202
+ type: winogrande
203
+ config: winogrande_xl
204
+ split: validation
205
+ args:
206
+ num_few_shot: 5
207
+ metrics:
208
+ - type: acc
209
+ value: 50.04
210
+ name: accuracy
211
+ source:
212
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Smol-Llama-101M-Chat-v1
213
+ name: Open LLM Leaderboard
214
+ - task:
215
+ type: text-generation
216
+ name: Text Generation
217
+ dataset:
218
+ name: GSM8k (5-shot)
219
+ type: gsm8k
220
+ config: main
221
+ split: test
222
+ args:
223
+ num_few_shot: 5
224
+ metrics:
225
+ - type: acc
226
+ value: 0.08
227
+ name: accuracy
228
+ source:
229
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Smol-Llama-101M-Chat-v1
230
+ name: Open LLM Leaderboard
231
  ---
232
 
233
  # A Llama Chat Model of 101M Parameters
 
262
  top_k: 4
263
  repetition_penalty: 1.105
264
  ```
265
+
266
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
267
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Felladrin__Smol-Llama-101M-Chat-v1)
268
+
269
+ | Metric |Value|
270
+ |---------------------------------|----:|
271
+ |Avg. |28.73|
272
+ |AI2 Reasoning Challenge (25-Shot)|22.87|
273
+ |HellaSwag (10-Shot) |28.69|
274
+ |MMLU (5-Shot) |24.93|
275
+ |TruthfulQA (0-shot) |45.76|
276
+ |Winogrande (5-shot) |50.04|
277
+ |GSM8k (5-shot) | 0.08|
278
+