pszemraj leaderboard-pr-bot commited on
Commit
7e772af
1 Parent(s): accdf0e

Adding Evaluation Results (#2)

Browse files

- Adding Evaluation Results (8ca1f7fae60d754e1f4c1df005ae096109b3f2bb)


Co-authored-by: Open LLM Leaderboard PR Bot <leaderboard-pr-bot@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +117 -0
README.md CHANGED
@@ -74,6 +74,109 @@ parameters:
74
  repetition_penalty: 3.5
75
  length_penalty: 0.9
76
  base_model: EleutherAI/gpt-neo-1.3B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
  ---
78
 
79
 
@@ -125,3 +228,17 @@ The following hyperparameters were used during training:
125
  - Transformers 4.22.2
126
  - Pytorch 1.10.0+cu113
127
  - Tokenizers 0.12.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  repetition_penalty: 3.5
75
  length_penalty: 0.9
76
  base_model: EleutherAI/gpt-neo-1.3B
77
+ model-index:
78
+ - name: gpt-neo-1.3B-emailgen
79
+ results:
80
+ - task:
81
+ type: text-generation
82
+ name: Text Generation
83
+ dataset:
84
+ name: AI2 Reasoning Challenge (25-Shot)
85
+ type: ai2_arc
86
+ config: ARC-Challenge
87
+ split: test
88
+ args:
89
+ num_few_shot: 25
90
+ metrics:
91
+ - type: acc_norm
92
+ value: 29.95
93
+ name: normalized accuracy
94
+ source:
95
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=postbot/gpt-neo-1.3B-emailgen
96
+ name: Open LLM Leaderboard
97
+ - task:
98
+ type: text-generation
99
+ name: Text Generation
100
+ dataset:
101
+ name: HellaSwag (10-Shot)
102
+ type: hellaswag
103
+ split: validation
104
+ args:
105
+ num_few_shot: 10
106
+ metrics:
107
+ - type: acc_norm
108
+ value: 47.95
109
+ name: normalized accuracy
110
+ source:
111
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=postbot/gpt-neo-1.3B-emailgen
112
+ name: Open LLM Leaderboard
113
+ - task:
114
+ type: text-generation
115
+ name: Text Generation
116
+ dataset:
117
+ name: MMLU (5-Shot)
118
+ type: cais/mmlu
119
+ config: all
120
+ split: test
121
+ args:
122
+ num_few_shot: 5
123
+ metrics:
124
+ - type: acc
125
+ value: 24.11
126
+ name: accuracy
127
+ source:
128
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=postbot/gpt-neo-1.3B-emailgen
129
+ name: Open LLM Leaderboard
130
+ - task:
131
+ type: text-generation
132
+ name: Text Generation
133
+ dataset:
134
+ name: TruthfulQA (0-shot)
135
+ type: truthful_qa
136
+ config: multiple_choice
137
+ split: validation
138
+ args:
139
+ num_few_shot: 0
140
+ metrics:
141
+ - type: mc2
142
+ value: 42.55
143
+ source:
144
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=postbot/gpt-neo-1.3B-emailgen
145
+ name: Open LLM Leaderboard
146
+ - task:
147
+ type: text-generation
148
+ name: Text Generation
149
+ dataset:
150
+ name: Winogrande (5-shot)
151
+ type: winogrande
152
+ config: winogrande_xl
153
+ split: validation
154
+ args:
155
+ num_few_shot: 5
156
+ metrics:
157
+ - type: acc
158
+ value: 56.27
159
+ name: accuracy
160
+ source:
161
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=postbot/gpt-neo-1.3B-emailgen
162
+ name: Open LLM Leaderboard
163
+ - task:
164
+ type: text-generation
165
+ name: Text Generation
166
+ dataset:
167
+ name: GSM8k (5-shot)
168
+ type: gsm8k
169
+ config: main
170
+ split: test
171
+ args:
172
+ num_few_shot: 5
173
+ metrics:
174
+ - type: acc
175
+ value: 0.0
176
+ name: accuracy
177
+ source:
178
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=postbot/gpt-neo-1.3B-emailgen
179
+ name: Open LLM Leaderboard
180
  ---
181
 
182
 
 
228
  - Transformers 4.22.2
229
  - Pytorch 1.10.0+cu113
230
  - Tokenizers 0.12.1
231
+
232
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
233
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_postbot__gpt-neo-1.3B-emailgen)
234
+
235
+ | Metric |Value|
236
+ |---------------------------------|----:|
237
+ |Avg. |33.47|
238
+ |AI2 Reasoning Challenge (25-Shot)|29.95|
239
+ |HellaSwag (10-Shot) |47.95|
240
+ |MMLU (5-Shot) |24.11|
241
+ |TruthfulQA (0-shot) |42.55|
242
+ |Winogrande (5-shot) |56.27|
243
+ |GSM8k (5-shot) | 0.00|
244
+