Adding the Open Portuguese LLM Leaderboard Evaluation Results

#9
Files changed (1) hide show
  1. README.md +161 -0
README.md CHANGED
@@ -106,6 +106,150 @@ model-index:
106
  source:
107
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=22h/open-cabrita3b
108
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
109
  ---
110
  The Cabrita model is a collection of continued pre-trained and tokenizer-adapted models for the Portuguese language.
111
  This artifact is the 3 billion size variant.
@@ -136,3 +280,20 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
136
  |Winogrande (5-shot) |59.43|
137
  |GSM8k (5-shot) | 0.99|
138
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
106
  source:
107
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=22h/open-cabrita3b
108
  name: Open LLM Leaderboard
109
+ - task:
110
+ type: text-generation
111
+ name: Text Generation
112
+ dataset:
113
+ name: ENEM Challenge (No Images)
114
+ type: eduagarcia/enem_challenge
115
+ split: train
116
+ args:
117
+ num_few_shot: 3
118
+ metrics:
119
+ - type: acc
120
+ value: 17.98
121
+ name: accuracy
122
+ source:
123
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/open-cabrita3b
124
+ name: Open Portuguese LLM Leaderboard
125
+ - task:
126
+ type: text-generation
127
+ name: Text Generation
128
+ dataset:
129
+ name: BLUEX (No Images)
130
+ type: eduagarcia-temp/BLUEX_without_images
131
+ split: train
132
+ args:
133
+ num_few_shot: 3
134
+ metrics:
135
+ - type: acc
136
+ value: 21.14
137
+ name: accuracy
138
+ source:
139
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/open-cabrita3b
140
+ name: Open Portuguese LLM Leaderboard
141
+ - task:
142
+ type: text-generation
143
+ name: Text Generation
144
+ dataset:
145
+ name: OAB Exams
146
+ type: eduagarcia/oab_exams
147
+ split: train
148
+ args:
149
+ num_few_shot: 3
150
+ metrics:
151
+ - type: acc
152
+ value: 22.69
153
+ name: accuracy
154
+ source:
155
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/open-cabrita3b
156
+ name: Open Portuguese LLM Leaderboard
157
+ - task:
158
+ type: text-generation
159
+ name: Text Generation
160
+ dataset:
161
+ name: Assin2 RTE
162
+ type: assin2
163
+ split: test
164
+ args:
165
+ num_few_shot: 15
166
+ metrics:
167
+ - type: f1_macro
168
+ value: 43.01
169
+ name: f1-macro
170
+ source:
171
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/open-cabrita3b
172
+ name: Open Portuguese LLM Leaderboard
173
+ - task:
174
+ type: text-generation
175
+ name: Text Generation
176
+ dataset:
177
+ name: Assin2 STS
178
+ type: eduagarcia/portuguese_benchmark
179
+ split: test
180
+ args:
181
+ num_few_shot: 15
182
+ metrics:
183
+ - type: pearson
184
+ value: 8.92
185
+ name: pearson
186
+ source:
187
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/open-cabrita3b
188
+ name: Open Portuguese LLM Leaderboard
189
+ - task:
190
+ type: text-generation
191
+ name: Text Generation
192
+ dataset:
193
+ name: FaQuAD NLI
194
+ type: ruanchaves/faquad-nli
195
+ split: test
196
+ args:
197
+ num_few_shot: 15
198
+ metrics:
199
+ - type: f1_macro
200
+ value: 43.97
201
+ name: f1-macro
202
+ source:
203
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/open-cabrita3b
204
+ name: Open Portuguese LLM Leaderboard
205
+ - task:
206
+ type: text-generation
207
+ name: Text Generation
208
+ dataset:
209
+ name: HateBR Binary
210
+ type: ruanchaves/hatebr
211
+ split: test
212
+ args:
213
+ num_few_shot: 25
214
+ metrics:
215
+ - type: f1_macro
216
+ value: 50.46
217
+ name: f1-macro
218
+ source:
219
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/open-cabrita3b
220
+ name: Open Portuguese LLM Leaderboard
221
+ - task:
222
+ type: text-generation
223
+ name: Text Generation
224
+ dataset:
225
+ name: PT Hate Speech Binary
226
+ type: hate_speech_portuguese
227
+ split: test
228
+ args:
229
+ num_few_shot: 25
230
+ metrics:
231
+ - type: f1_macro
232
+ value: 41.19
233
+ name: f1-macro
234
+ source:
235
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/open-cabrita3b
236
+ name: Open Portuguese LLM Leaderboard
237
+ - task:
238
+ type: text-generation
239
+ name: Text Generation
240
+ dataset:
241
+ name: tweetSentBR
242
+ type: eduagarcia-temp/tweetsentbr
243
+ split: test
244
+ args:
245
+ num_few_shot: 25
246
+ metrics:
247
+ - type: f1_macro
248
+ value: 47.96
249
+ name: f1-macro
250
+ source:
251
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=22h/open-cabrita3b
252
+ name: Open Portuguese LLM Leaderboard
253
  ---
254
  The Cabrita model is a collection of continued pre-trained and tokenizer-adapted models for the Portuguese language.
255
  This artifact is the 3 billion size variant.
 
280
  |Winogrande (5-shot) |59.43|
281
  |GSM8k (5-shot) | 0.99|
282
 
283
+
284
+ # [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
285
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/22h/open-cabrita3b)
286
+
287
+ | Metric | Value |
288
+ |--------------------------|---------|
289
+ |Average |**33.04**|
290
+ |ENEM Challenge (No Images)| 17.98|
291
+ |BLUEX (No Images) | 21.14|
292
+ |OAB Exams | 22.69|
293
+ |Assin2 RTE | 43.01|
294
+ |Assin2 STS | 8.92|
295
+ |FaQuAD NLI | 43.97|
296
+ |HateBR Binary | 50.46|
297
+ |PT Hate Speech Binary | 41.19|
298
+ |tweetSentBR | 47.96|
299
+