leaderboard-pr-bot commited on
Commit
236dc63
1 Parent(s): 7f36397

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +190 -68
README.md CHANGED
@@ -1,152 +1,158 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
 
5
  datasets:
6
  - togethercomputer/RedPajama-Data-1T
7
  - togethercomputer/RedPajama-Data-Instruct
8
  widget:
9
- - text: |-
10
- Label the sentences as either 'positive', 'negative', 'mixed', or 'neutral':
11
-
12
- Sentence: I can say that there isn't anything I would change.
13
- Label: positive
14
-
15
- Sentence: I'm not sure about this.
16
- Label: neutral
17
-
18
- Sentence: I liked some parts but I didn't like other parts.
19
- Label: mixed
20
-
21
- Sentence: I think the background image could have been better.
22
- Label: negative
23
-
24
- Sentence: I really like it.
25
- Label:
26
  example_title: Sentiment Analysis
27
- - text: |-
28
- Please answer the following question:
29
 
30
  Question: What is the capital of Canada?
 
31
  Answer: Ottawa
32
 
 
33
  Question: What is the currency of Switzerland?
 
34
  Answer: Swiss franc
35
 
 
36
  Question: In which country is Wisconsin located?
37
- Answer:
 
38
  example_title: Question Answering
39
- - text: >-
40
- Given a news article, classify its topic.
41
 
42
  Possible labels: 1. World 2. Sports 3. Business 4. Sci/Tech
43
 
44
 
45
- Article: A nearby star thought to harbor comets and asteroids now appears to
46
- be home to planets, too.
47
 
48
  Label: Sci/Tech
49
 
50
 
51
- Article: Soaring crude prices plus worries about the economy and the outlook
52
- for earnings are expected to hang over the stock market next week during the
53
- depth of the summer doldrums.
54
 
55
  Label: Business
56
 
57
 
58
- Article: Murtagh a stickler for success Northeastern field hockey coach
59
- Cheryl Murtagh doesn't want the glare of the spotlight that shines on her to
60
- detract from a team that has been the America East champion for the past
61
- three years and has been to the NCAA tournament 13 times.
62
 
63
- Label::
64
  example_title: Topic Classification
65
- - text: |-
66
- Paraphrase the given sentence into a different sentence.
67
 
68
  Input: Can you recommend some upscale restaurants in New York?
 
69
  Output: What upscale restaurants do you recommend in New York?
70
 
 
71
  Input: What are the famous places we should not miss in Paris?
 
72
  Output: Recommend some of the best places to visit in Paris?
73
 
 
74
  Input: Could you recommend some hotels that have cheap price in Zurich?
75
- Output:
 
76
  example_title: Paraphrasing
77
- - text: >-
78
- Given a review from Amazon's food products, the task is to generate a short
79
  summary of the given review in the input.
80
 
81
 
82
- Input: I have bought several of the Vitality canned dog food products and
83
- have found them all to be of good quality. The product looks more like a
84
- stew than a processed meat and it smells better. My Labrador is finicky and
85
- she appreciates this product better than most.
86
 
87
  Output: Good Quality Dog Food
88
 
89
 
90
- Input: Product arrived labeled as Jumbo Salted Peanuts...the peanuts were
91
- actually small sized unsalted. Not sure if this was an error or if the
92
- vendor intended to represent the product as 'Jumbo'.
93
 
94
  Output: Not as Advertised
95
 
96
 
97
- Input: My toddler loves this game to a point where he asks for it. That's a
98
- big thing for me. Secondly, no glitching unlike one of their competitors
99
- (PlayShifu). Any tech I don’t have to reach out to support for help is a
100
- good tech for me. I even enjoy some of the games and activities in this.
101
- Overall, this is a product that shows that the developers took their time
102
- and made sure people would not be asking for refund. I’ve become bias
103
- regarding this product and honestly I look forward to buying more of this
104
- company’s stuff. Please keep up the great work.
105
 
106
- Output:
107
  example_title: Text Summarization
108
- - text: |-
109
- Identify which sense of a word is meant in a given context.
110
 
111
  Context: The river overflowed the bank.
 
112
  Word: bank
 
113
  Sense: river bank
114
 
 
115
  Context: A mouse takes much more room than a trackball.
 
116
  Word: mouse
 
117
  Sense: computer mouse
118
 
 
119
  Context: The bank will not be accepting cash on Saturdays.
 
120
  Word: bank
 
121
  Sense: commercial (finance) banks
122
 
 
123
  Context: Bill killed the project
 
124
  Word: kill
125
- Sense:
 
126
  example_title: Word Sense Disambiguation
127
- - text: >-
128
- Given a pair of sentences, choose whether the two sentences agree
129
- (entailment)/disagree (contradiction) with each other.
130
 
131
  Possible labels: 1. entailment 2. contradiction
132
 
133
 
134
- Sentence 1: The skier was on the edge of the ramp. Sentence 2: The skier was
135
- dressed in winter clothes.
136
 
137
  Label: entailment
138
 
139
 
140
- Sentence 1: The boy skated down the staircase railing. Sentence 2: The boy
141
- is a newbie skater.
142
 
143
  Label: contradiction
144
 
145
 
146
- Sentence 1: Two middle-aged people stand by a golf hole. Sentence 2: A
147
- couple riding in a golf cart.
148
 
149
- Label:
150
  example_title: Natural Language Inference
151
  inference:
152
  parameters:
@@ -154,6 +160,109 @@ inference:
154
  top_p: 0.7
155
  top_k: 50
156
  max_new_tokens: 128
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
  ---
158
 
159
  # RedPajama-INCITE-7B-Instruct
@@ -341,4 +450,17 @@ Please refer to [togethercomputer/RedPajama-Data-1T](https://huggingface.co/data
341
 
342
  ## Community
343
 
344
- Join us on [Together Discord](https://discord.gg/6ZVDU8tTD4)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - en
4
+ license: apache-2.0
5
  datasets:
6
  - togethercomputer/RedPajama-Data-1T
7
  - togethercomputer/RedPajama-Data-Instruct
8
  widget:
9
+ - text: "Label the sentences as either 'positive', 'negative', 'mixed', or 'neutral':\
10
+ \ \n\nSentence: I can say that there isn't anything I would change.\nLabel: positive\n\
11
+ \nSentence: I'm not sure about this.\nLabel: neutral\n\nSentence: I liked some\
12
+ \ parts but I didn't like other parts.\nLabel: mixed\n\nSentence: I think the\
13
+ \ background image could have been better.\nLabel: negative\n\nSentence: I really\
14
+ \ like it.\nLabel:"
 
 
 
 
 
 
 
 
 
 
 
15
  example_title: Sentiment Analysis
16
+ - text: 'Please answer the following question:
17
+
18
 
19
  Question: What is the capital of Canada?
20
+
21
  Answer: Ottawa
22
 
23
+
24
  Question: What is the currency of Switzerland?
25
+
26
  Answer: Swiss franc
27
 
28
+
29
  Question: In which country is Wisconsin located?
30
+
31
+ Answer:'
32
  example_title: Question Answering
33
+ - text: 'Given a news article, classify its topic.
 
34
 
35
  Possible labels: 1. World 2. Sports 3. Business 4. Sci/Tech
36
 
37
 
38
+ Article: A nearby star thought to harbor comets and asteroids now appears to be
39
+ home to planets, too.
40
 
41
  Label: Sci/Tech
42
 
43
 
44
+ Article: Soaring crude prices plus worries about the economy and the outlook for
45
+ earnings are expected to hang over the stock market next week during the depth
46
+ of the summer doldrums.
47
 
48
  Label: Business
49
 
50
 
51
+ Article: Murtagh a stickler for success Northeastern field hockey coach Cheryl
52
+ Murtagh doesn''t want the glare of the spotlight that shines on her to detract
53
+ from a team that has been the America East champion for the past three years and
54
+ has been to the NCAA tournament 13 times.
55
 
56
+ Label::'
57
  example_title: Topic Classification
58
+ - text: 'Paraphrase the given sentence into a different sentence.
59
+
60
 
61
  Input: Can you recommend some upscale restaurants in New York?
62
+
63
  Output: What upscale restaurants do you recommend in New York?
64
 
65
+
66
  Input: What are the famous places we should not miss in Paris?
67
+
68
  Output: Recommend some of the best places to visit in Paris?
69
 
70
+
71
  Input: Could you recommend some hotels that have cheap price in Zurich?
72
+
73
+ Output:'
74
  example_title: Paraphrasing
75
+ - text: 'Given a review from Amazon''s food products, the task is to generate a short
 
76
  summary of the given review in the input.
77
 
78
 
79
+ Input: I have bought several of the Vitality canned dog food products and have
80
+ found them all to be of good quality. The product looks more like a stew than
81
+ a processed meat and it smells better. My Labrador is finicky and she appreciates
82
+ this product better than most.
83
 
84
  Output: Good Quality Dog Food
85
 
86
 
87
+ Input: Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually
88
+ small sized unsalted. Not sure if this was an error or if the vendor intended
89
+ to represent the product as ''Jumbo''.
90
 
91
  Output: Not as Advertised
92
 
93
 
94
+ Input: My toddler loves this game to a point where he asks for it. That''s a big
95
+ thing for me. Secondly, no glitching unlike one of their competitors (PlayShifu).
96
+ Any tech I don’t have to reach out to support for help is a good tech for me.
97
+ I even enjoy some of the games and activities in this. Overall, this is a product
98
+ that shows that the developers took their time and made sure people would not
99
+ be asking for refund. I’ve become bias regarding this product and honestly I look
100
+ forward to buying more of this company’s stuff. Please keep up the great work.
 
101
 
102
+ Output:'
103
  example_title: Text Summarization
104
+ - text: 'Identify which sense of a word is meant in a given context.
105
+
106
 
107
  Context: The river overflowed the bank.
108
+
109
  Word: bank
110
+
111
  Sense: river bank
112
 
113
+
114
  Context: A mouse takes much more room than a trackball.
115
+
116
  Word: mouse
117
+
118
  Sense: computer mouse
119
 
120
+
121
  Context: The bank will not be accepting cash on Saturdays.
122
+
123
  Word: bank
124
+
125
  Sense: commercial (finance) banks
126
 
127
+
128
  Context: Bill killed the project
129
+
130
  Word: kill
131
+
132
+ Sense:'
133
  example_title: Word Sense Disambiguation
134
+ - text: 'Given a pair of sentences, choose whether the two sentences agree (entailment)/disagree
135
+ (contradiction) with each other.
 
136
 
137
  Possible labels: 1. entailment 2. contradiction
138
 
139
 
140
+ Sentence 1: The skier was on the edge of the ramp. Sentence 2: The skier was dressed
141
+ in winter clothes.
142
 
143
  Label: entailment
144
 
145
 
146
+ Sentence 1: The boy skated down the staircase railing. Sentence 2: The boy is
147
+ a newbie skater.
148
 
149
  Label: contradiction
150
 
151
 
152
+ Sentence 1: Two middle-aged people stand by a golf hole. Sentence 2: A couple
153
+ riding in a golf cart.
154
 
155
+ Label:'
156
  example_title: Natural Language Inference
157
  inference:
158
  parameters:
 
160
  top_p: 0.7
161
  top_k: 50
162
  max_new_tokens: 128
163
+ model-index:
164
+ - name: RedPajama-INCITE-Instruct-7B-v0.1
165
+ results:
166
+ - task:
167
+ type: text-generation
168
+ name: Text Generation
169
+ dataset:
170
+ name: AI2 Reasoning Challenge (25-Shot)
171
+ type: ai2_arc
172
+ config: ARC-Challenge
173
+ split: test
174
+ args:
175
+ num_few_shot: 25
176
+ metrics:
177
+ - type: acc_norm
178
+ value: 44.11
179
+ name: normalized accuracy
180
+ source:
181
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
182
+ name: Open LLM Leaderboard
183
+ - task:
184
+ type: text-generation
185
+ name: Text Generation
186
+ dataset:
187
+ name: HellaSwag (10-Shot)
188
+ type: hellaswag
189
+ split: validation
190
+ args:
191
+ num_few_shot: 10
192
+ metrics:
193
+ - type: acc_norm
194
+ value: 72.02
195
+ name: normalized accuracy
196
+ source:
197
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
198
+ name: Open LLM Leaderboard
199
+ - task:
200
+ type: text-generation
201
+ name: Text Generation
202
+ dataset:
203
+ name: MMLU (5-Shot)
204
+ type: cais/mmlu
205
+ config: all
206
+ split: test
207
+ args:
208
+ num_few_shot: 5
209
+ metrics:
210
+ - type: acc
211
+ value: 37.62
212
+ name: accuracy
213
+ source:
214
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
215
+ name: Open LLM Leaderboard
216
+ - task:
217
+ type: text-generation
218
+ name: Text Generation
219
+ dataset:
220
+ name: TruthfulQA (0-shot)
221
+ type: truthful_qa
222
+ config: multiple_choice
223
+ split: validation
224
+ args:
225
+ num_few_shot: 0
226
+ metrics:
227
+ - type: mc2
228
+ value: 33.96
229
+ source:
230
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
231
+ name: Open LLM Leaderboard
232
+ - task:
233
+ type: text-generation
234
+ name: Text Generation
235
+ dataset:
236
+ name: Winogrande (5-shot)
237
+ type: winogrande
238
+ config: winogrande_xl
239
+ split: validation
240
+ args:
241
+ num_few_shot: 5
242
+ metrics:
243
+ - type: acc
244
+ value: 64.96
245
+ name: accuracy
246
+ source:
247
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
248
+ name: Open LLM Leaderboard
249
+ - task:
250
+ type: text-generation
251
+ name: Text Generation
252
+ dataset:
253
+ name: GSM8k (5-shot)
254
+ type: gsm8k
255
+ config: main
256
+ split: test
257
+ args:
258
+ num_few_shot: 5
259
+ metrics:
260
+ - type: acc
261
+ value: 1.59
262
+ name: accuracy
263
+ source:
264
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
265
+ name: Open LLM Leaderboard
266
  ---
267
 
268
  # RedPajama-INCITE-7B-Instruct
 
450
 
451
  ## Community
452
 
453
+ Join us on [Together Discord](https://discord.gg/6ZVDU8tTD4)
454
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
455
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_togethercomputer__RedPajama-INCITE-Instruct-7B-v0.1)
456
+
457
+ | Metric |Value|
458
+ |---------------------------------|----:|
459
+ |Avg. |42.38|
460
+ |AI2 Reasoning Challenge (25-Shot)|44.11|
461
+ |HellaSwag (10-Shot) |72.02|
462
+ |MMLU (5-Shot) |37.62|
463
+ |TruthfulQA (0-shot) |33.96|
464
+ |Winogrande (5-shot) |64.96|
465
+ |GSM8k (5-shot) | 1.59|
466
+