ybelkada HF staff Muennighoff commited on
Commit
bb3556d
1 Parent(s): 168ece4

Add evaluation (#27)

Browse files

- Add evaluation (f4d13b5ba0ecc4e0ef2ac892aadcc7daa33d3529)
- eng -> fra for crows_pairs_french (b07b3ae34315243657bf764d1ef5cf87d627dd50)
- Move to subsection (f1a18285e419e05ef30872b0f1aa6c0a62ed5782)
- Fix typos (d5d729d96775c846213efcc7c802613a4dd2c728)
- Update training statistics (5ebccd1f300f458cfed43489512b120a51a2ef2a)
- Update README.md (094e07d1004d2663cefd5f19fd8fe6282e0872c7)


Co-authored-by: Niklas Muennighoff <Muennighoff@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +1757 -8
README.md CHANGED
@@ -155,6 +155,1601 @@ widget:
155
  A: Let's think step by step.
156
  example_title: Mathematical reasoning
157
  group: English
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
158
  ---
159
 
160
  <img src="https://s3.amazonaws.com/moonup/production/uploads/1657124309515-5f17f0a0925b9863e28ad517.png" alt="BigScience Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
@@ -559,28 +2154,182 @@ Includes:
559
  And multiple different metrics for specific tasks. _(More evaluation metrics forthcoming upon completion of evaluation protocol.)_
560
 
561
  ## Factors
562
- *This section lists some different aspects of what BLOOM models. Its focus is on those aspects that are likely to give rise to high variance in model behavior.*
563
 
564
  - Language, such as English or Yoruba
565
 
566
  - Domain, such as newswire or stories
567
-
568
  - Demographic characteristics, such as gender or nationality
569
 
570
  ## Results
571
  *Results are based on the [Factors](#factors) and [Metrics](#metrics).*
572
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
573
  **Train-time Evaluation:**
574
 
575
- As of 25.May.2022, 15:00 PST:
576
 
577
- - Training Loss: 2.0
578
 
579
- - Validation Loss: 2.2
580
 
581
- - Perplexity: 8.9
582
 
583
- (More evaluation scores forthcoming.)
584
 
585
  </details>
586
 
@@ -675,4 +2424,4 @@ Initial prompting experiments using interim checkpoints: https://huggingface.co/
675
  # Model Card Authors
676
  *Ordered roughly chronologically and by amount of time spent.*
677
 
678
- Margaret Mitchell, Giada Pistilli, Yacine Jernite, Ezinwanne Ozoani, Marissa Gerchick, Nazneen Rajani, Sasha Luccioni, Irene Solaiman, Maraim Masoud, Somaieh Nikpoor, Carlos Muñoz Ferrandis, Stas Bekman, Christopher Akiki, Danish Contractor, David Lansky, Angelina McMillan-Major, Tristan Thrush, Suzana Ilić, Gérard Dupont, Shayne Longpre, Manan Dey, Stella Biderman, Douwe Kiela, Emi Baylor, Teven Le Scao, Aaron Gokaslan, Julien Launay
155
  A: Let's think step by step.
156
  example_title: Mathematical reasoning
157
  group: English
158
+ model-index:
159
+ - name: bloom
160
+ results:
161
+ - task:
162
+ type: text-generation
163
+ name: text generation
164
+ dataset:
165
+ name: arc_challenge
166
+ type: arc_challenge
167
+ metrics:
168
+ - name: acc
169
+ type: acc
170
+ value: 0.4112627986348123
171
+ verified: false
172
+ - task:
173
+ type: text-generation
174
+ name: text generation
175
+ dataset:
176
+ name: arc_easy
177
+ type: arc_easy
178
+ metrics:
179
+ - name: acc
180
+ type: acc
181
+ value: 0.726010101010101
182
+ verified: false
183
+ - task:
184
+ type: text-generation
185
+ name: text generation
186
+ dataset:
187
+ name: axb
188
+ type: axb
189
+ metrics:
190
+ - name: acc
191
+ type: acc
192
+ value: 0.5751811594202898
193
+ verified: false
194
+ - task:
195
+ type: text-generation
196
+ name: text generation
197
+ dataset:
198
+ name: axg
199
+ type: axg
200
+ metrics:
201
+ - name: acc
202
+ type: acc
203
+ value: 0.5252808988764045
204
+ verified: false
205
+ - task:
206
+ type: text-generation
207
+ name: text generation
208
+ dataset:
209
+ name: boolq
210
+ type: boolq
211
+ metrics:
212
+ - name: acc
213
+ type: acc
214
+ value: 0.6345565749235474
215
+ verified: false
216
+ - task:
217
+ type: text-generation
218
+ name: text generation
219
+ dataset:
220
+ name: cb
221
+ type: cb
222
+ metrics:
223
+ - name: acc
224
+ type: acc
225
+ value: 0.3392857142857143
226
+ verified: false
227
+ - task:
228
+ type: text-generation
229
+ name: text generation
230
+ dataset:
231
+ name: cola
232
+ type: cola
233
+ metrics:
234
+ - name: acc
235
+ type: acc
236
+ value: 0.39022051773729627
237
+ verified: false
238
+ - task:
239
+ type: text-generation
240
+ name: text generation
241
+ dataset:
242
+ name: copa
243
+ type: copa
244
+ metrics:
245
+ - name: acc
246
+ type: acc
247
+ value: 0.56
248
+ verified: false
249
+ - task:
250
+ type: text-generation
251
+ name: text generation
252
+ dataset:
253
+ name: crows_pairs_english
254
+ type: crows_pairs_english
255
+ metrics:
256
+ - name: acc
257
+ type: acc
258
+ value: 0.5
259
+ verified: false
260
+ - task:
261
+ type: text-generation
262
+ name: text generation
263
+ dataset:
264
+ name: crows_pairs_french
265
+ type: crows_pairs_french
266
+ metrics:
267
+ - name: acc
268
+ type: acc
269
+ value: 0.505664877757901
270
+ verified: false
271
+ - task:
272
+ type: text-generation
273
+ name: text generation
274
+ dataset:
275
+ name: diabla
276
+ type: diabla
277
+ metrics:
278
+ - name: acc
279
+ type: acc
280
+ value: 0.2947981906750174
281
+ verified: false
282
+ - task:
283
+ type: text-generation
284
+ name: text generation
285
+ dataset:
286
+ name: gsarti/flores_101_afr
287
+ type: gsarti/flores_101_afr
288
+ metrics:
289
+ - name: byte_perplexity
290
+ type: byte_perplexity
291
+ value: 4.25431550058444
292
+ verified: false
293
+ - task:
294
+ type: text-generation
295
+ name: text generation
296
+ dataset:
297
+ name: gsarti/flores_101_amh
298
+ type: gsarti/flores_101_amh
299
+ metrics:
300
+ - name: byte_perplexity
301
+ type: byte_perplexity
302
+ value: 3.716877477347089
303
+ verified: false
304
+ - task:
305
+ type: text-generation
306
+ name: text generation
307
+ dataset:
308
+ name: gsarti/flores_101_ara
309
+ type: gsarti/flores_101_ara
310
+ metrics:
311
+ - name: byte_perplexity
312
+ type: byte_perplexity
313
+ value: 1.7049030137120964
314
+ verified: false
315
+ - task:
316
+ type: text-generation
317
+ name: text generation
318
+ dataset:
319
+ name: gsarti/flores_101_asm
320
+ type: gsarti/flores_101_asm
321
+ metrics:
322
+ - name: byte_perplexity
323
+ type: byte_perplexity
324
+ value: 6.576581380404954
325
+ verified: false
326
+ - task:
327
+ type: text-generation
328
+ name: text generation
329
+ dataset:
330
+ name: gsarti/flores_101_ast
331
+ type: gsarti/flores_101_ast
332
+ metrics:
333
+ - name: byte_perplexity
334
+ type: byte_perplexity
335
+ value: 2.8562364775797944
336
+ verified: false
337
+ - task:
338
+ type: text-generation
339
+ name: text generation
340
+ dataset:
341
+ name: gsarti/flores_101_azj
342
+ type: gsarti/flores_101_azj
343
+ metrics:
344
+ - name: byte_perplexity
345
+ type: byte_perplexity
346
+ value: 4.80721528624391
347
+ verified: false
348
+ - task:
349
+ type: text-generation
350
+ name: text generation
351
+ dataset:
352
+ name: gsarti/flores_101_bel
353
+ type: gsarti/flores_101_bel
354
+ metrics:
355
+ - name: byte_perplexity
356
+ type: byte_perplexity
357
+ value: 2.7312177406635065
358
+ verified: false
359
+ - task:
360
+ type: text-generation
361
+ name: text generation
362
+ dataset:
363
+ name: gsarti/flores_101_ben
364
+ type: gsarti/flores_101_ben
365
+ metrics:
366
+ - name: byte_perplexity
367
+ type: byte_perplexity
368
+ value: 5.993409478990023
369
+ verified: false
370
+ - task:
371
+ type: text-generation
372
+ name: text generation
373
+ dataset:
374
+ name: gsarti/flores_101_bos
375
+ type: gsarti/flores_101_bos
376
+ metrics:
377
+ - name: byte_perplexity
378
+ type: byte_perplexity
379
+ value: 3.5936169095529493
380
+ verified: false
381
+ - task:
382
+ type: text-generation
383
+ name: text generation
384
+ dataset:
385
+ name: gsarti/flores_101_bul
386
+ type: gsarti/flores_101_bul
387
+ metrics:
388
+ - name: byte_perplexity
389
+ type: byte_perplexity
390
+ value: 2.159035321398085
391
+ verified: false
392
+ - task:
393
+ type: text-generation
394
+ name: text generation
395
+ dataset:
396
+ name: gsarti/flores_101_cat
397
+ type: gsarti/flores_101_cat
398
+ metrics:
399
+ - name: byte_perplexity
400
+ type: byte_perplexity
401
+ value: 2.167873680006659
402
+ verified: false
403
+ - task:
404
+ type: text-generation
405
+ name: text generation
406
+ dataset:
407
+ name: gsarti/flores_101_ceb
408
+ type: gsarti/flores_101_ceb
409
+ metrics:
410
+ - name: byte_perplexity
411
+ type: byte_perplexity
412
+ value: 5.286975089885673
413
+ verified: false
414
+ - task:
415
+ type: text-generation
416
+ name: text generation
417
+ dataset:
418
+ name: gsarti/flores_101_ces
419
+ type: gsarti/flores_101_ces
420
+ metrics:
421
+ - name: byte_perplexity
422
+ type: byte_perplexity
423
+ value: 3.4516208322236017
424
+ verified: false
425
+ - task:
426
+ type: text-generation
427
+ name: text generation
428
+ dataset:
429
+ name: gsarti/flores_101_ckb
430
+ type: gsarti/flores_101_ckb
431
+ metrics:
432
+ - name: byte_perplexity
433
+ type: byte_perplexity
434
+ value: 3.7051034724765612
435
+ verified: false
436
+ - task:
437
+ type: text-generation
438
+ name: text generation
439
+ dataset:
440
+ name: gsarti/flores_101_cym
441
+ type: gsarti/flores_101_cym
442
+ metrics:
443
+ - name: byte_perplexity
444
+ type: byte_perplexity
445
+ value: 7.0889312398688125
446
+ verified: false
447
+ - task:
448
+ type: text-generation
449
+ name: text generation
450
+ dataset:
451
+ name: gsarti/flores_101_dan
452
+ type: gsarti/flores_101_dan
453
+ metrics:
454
+ - name: byte_perplexity
455
+ type: byte_perplexity
456
+ value: 3.4300748208111838
457
+ verified: false
458
+ - task:
459
+ type: text-generation
460
+ name: text generation
461
+ dataset:
462
+ name: gsarti/flores_101_deu
463
+ type: gsarti/flores_101_deu
464
+ metrics:
465
+ - name: byte_perplexity
466
+ type: byte_perplexity
467
+ value: 2.3380585896268107
468
+ verified: false
469
+ - task:
470
+ type: text-generation
471
+ name: text generation
472
+ dataset:
473
+ name: gsarti/flores_101_ell
474
+ type: gsarti/flores_101_ell
475
+ metrics:
476
+ - name: byte_perplexity
477
+ type: byte_perplexity
478
+ value: 1.9595604725375586
479
+ verified: false
480
+ - task:
481
+ type: text-generation
482
+ name: text generation
483
+ dataset:
484
+ name: gsarti/flores_101_eng
485
+ type: gsarti/flores_101_eng
486
+ metrics:
487
+ - name: byte_perplexity
488
+ type: byte_perplexity
489
+ value: 1.8819637649637901
490
+ verified: false
491
+ - task:
492
+ type: text-generation
493
+ name: text generation
494
+ dataset:
495
+ name: gsarti/flores_101_est
496
+ type: gsarti/flores_101_est
497
+ metrics:
498
+ - name: byte_perplexity
499
+ type: byte_perplexity
500
+ value: 5.773850600380297
501
+ verified: false
502
+ - task:
503
+ type: text-generation
504
+ name: text generation
505
+ dataset:
506
+ name: gsarti/flores_101_fas
507
+ type: gsarti/flores_101_fas
508
+ metrics:
509
+ - name: byte_perplexity
510
+ type: byte_perplexity
511
+ value: 2.4306140728294086
512
+ verified: false
513
+ - task:
514
+ type: text-generation
515
+ name: text generation
516
+ dataset:
517
+ name: gsarti/flores_101_fin
518
+ type: gsarti/flores_101_fin
519
+ metrics:
520
+ - name: byte_perplexity
521
+ type: byte_perplexity
522
+ value: 4.304305536244342
523
+ verified: false
524
+ - task:
525
+ type: text-generation
526
+ name: text generation
527
+ dataset:
528
+ name: gsarti/flores_101_fra
529
+ type: gsarti/flores_101_fra
530
+ metrics:
531
+ - name: byte_perplexity
532
+ type: byte_perplexity
533
+ value: 1.9374688438541796
534
+ verified: false
535
+ - task:
536
+ type: text-generation
537
+ name: text generation
538
+ dataset:
539
+ name: gsarti/flores_101_ful
540
+ type: gsarti/flores_101_ful
541
+ metrics:
542
+ - name: byte_perplexity
543
+ type: byte_perplexity
544
+ value: 9.740353097219378
545
+ verified: false
546
+ - task:
547
+ type: text-generation
548
+ name: text generation
549
+ dataset:
550
+ name: gsarti/flores_101_gle
551
+ type: gsarti/flores_101_gle
552
+ metrics:
553
+ - name: byte_perplexity
554
+ type: byte_perplexity
555
+ value: 6.035269765075012
556
+ verified: false
557
+ - task:
558
+ type: text-generation
559
+ name: text generation
560
+ dataset:
561
+ name: gsarti/flores_101_glg
562
+ type: gsarti/flores_101_glg
563
+ metrics:
564
+ - name: byte_perplexity
565
+ type: byte_perplexity
566
+ value: 2.365451129546636
567
+ verified: false
568
+ - task:
569
+ type: text-generation
570
+ name: text generation
571
+ dataset:
572
+ name: gsarti/flores_101_guj
573
+ type: gsarti/flores_101_guj
574
+ metrics:
575
+ - name: byte_perplexity
576
+ type: byte_perplexity
577
+ value: 5.70676742569154
578
+ verified: false
579
+ - task:
580
+ type: text-generation
581
+ name: text generation
582
+ dataset:
583
+ name: gsarti/flores_101_hau
584
+ type: gsarti/flores_101_hau
585
+ metrics:
586
+ - name: byte_perplexity
587
+ type: byte_perplexity
588
+ value: 8.855204288260023
589
+ verified: false
590
+ - task:
591
+ type: text-generation
592
+ name: text generation
593
+ dataset:
594
+ name: gsarti/flores_101_heb
595
+ type: gsarti/flores_101_heb
596
+ metrics:
597
+ - name: byte_perplexity
598
+ type: byte_perplexity
599
+ value: 2.920943798471208
600
+ verified: false
601
+ - task:
602
+ type: text-generation
603
+ name: text generation
604
+ dataset:
605
+ name: gsarti/flores_101_hin
606
+ type: gsarti/flores_101_hin
607
+ metrics:
608
+ - name: byte_perplexity
609
+ type: byte_perplexity
610
+ value: 5.452028001573195
611
+ verified: false
612
+ - task:
613
+ type: text-generation
614
+ name: text generation
615
+ dataset:
616
+ name: gsarti/flores_101_hrv
617
+ type: gsarti/flores_101_hrv
618
+ metrics:
619
+ - name: byte_perplexity
620
+ type: byte_perplexity
621
+ value: 3.7056829077179225
622
+ verified: false
623
+ - task:
624
+ type: text-generation
625
+ name: text generation
626
+ dataset:
627
+ name: gsarti/flores_101_hun
628
+ type: gsarti/flores_101_hun
629
+ metrics:
630
+ - name: byte_perplexity
631
+ type: byte_perplexity
632
+ value: 4.058579478967854
633
+ verified: false
634
+ - task:
635
+ type: text-generation
636
+ name: text generation
637
+ dataset:
638
+ name: gsarti/flores_101_hye
639
+ type: gsarti/flores_101_hye
640
+ metrics:
641
+ - name: byte_perplexity
642
+ type: byte_perplexity
643
+ value: 3.127237816041562
644
+ verified: false
645
+ - task:
646
+ type: text-generation
647
+ name: text generation
648
+ dataset:
649
+ name: gsarti/flores_101_ibo
650
+ type: gsarti/flores_101_ibo
651
+ metrics:
652
+ - name: byte_perplexity
653
+ type: byte_perplexity
654
+ value: 3.9500357969906683
655
+ verified: false
656
+ - task:
657
+ type: text-generation
658
+ name: text generation
659
+ dataset:
660
+ name: gsarti/flores_101_ind
661
+ type: gsarti/flores_101_ind
662
+ metrics:
663
+ - name: byte_perplexity
664
+ type: byte_perplexity
665
+ value: 1.976163584180101
666
+ verified: false
667
+ - task:
668
+ type: text-generation
669
+ name: text generation
670
+ dataset:
671
+ name: gsarti/flores_101_isl
672
+ type: gsarti/flores_101_isl
673
+ metrics:
674
+ - name: byte_perplexity
675
+ type: byte_perplexity
676
+ value: 5.500542085165231
677
+ verified: false
678
+ - task:
679
+ type: text-generation
680
+ name: text generation
681
+ dataset:
682
+ name: gsarti/flores_101_ita
683
+ type: gsarti/flores_101_ita
684
+ metrics:
685
+ - name: byte_perplexity
686
+ type: byte_perplexity
687
+ value: 2.314465100752677
688
+ verified: false
689
+ - task:
690
+ type: text-generation
691
+ name: text generation
692
+ dataset:
693
+ name: gsarti/flores_101_jav
694
+ type: gsarti/flores_101_jav
695
+ metrics:
696
+ - name: byte_perplexity
697
+ type: byte_perplexity
698
+ value: 4.942322446550142
699
+ verified: false
700
+ - task:
701
+ type: text-generation
702
+ name: text generation
703
+ dataset:
704
+ name: gsarti/flores_101_jpn
705
+ type: gsarti/flores_101_jpn
706
+ metrics:
707
+ - name: byte_perplexity
708
+ type: byte_perplexity
709
+ value: 2.259421750521777
710
+ verified: false
711
+ - task:
712
+ type: text-generation
713
+ name: text generation
714
+ dataset:
715
+ name: gsarti/flores_101_kam
716
+ type: gsarti/flores_101_kam
717
+ metrics:
718
+ - name: byte_perplexity
719
+ type: byte_perplexity
720
+ value: 9.743025325635475
721
+ verified: false
722
+ - task:
723
+ type: text-generation
724
+ name: text generation
725
+ dataset:
726
+ name: gsarti/flores_101_kan
727
+ type: gsarti/flores_101_kan
728
+ metrics:
729
+ - name: byte_perplexity
730
+ type: byte_perplexity
731
+ value: 6.233724699944989
732
+ verified: false
733
+ - task:
734
+ type: text-generation
735
+ name: text generation
736
+ dataset:
737
+ name: gsarti/flores_101_kat
738
+ type: gsarti/flores_101_kat
739
+ metrics:
740
+ - name: byte_perplexity
741
+ type: byte_perplexity
742
+ value: 2.0508893415872107
743
+ verified: false
744
+ - task:
745
+ type: text-generation
746
+ name: text generation
747
+ dataset:
748
+ name: gsarti/flores_101_kaz
749
+ type: gsarti/flores_101_kaz
750
+ metrics:
751
+ - name: byte_perplexity
752
+ type: byte_perplexity
753
+ value: 3.0390148516287927
754
+ verified: false
755
+ - task:
756
+ type: text-generation
757
+ name: text generation
758
+ dataset:
759
+ name: gsarti/flores_101_kea
760
+ type: gsarti/flores_101_kea
761
+ metrics:
762
+ - name: byte_perplexity
763
+ type: byte_perplexity
764
+ value: 7.147132270533836
765
+ verified: false
766
+ - task:
767
+ type: text-generation
768
+ name: text generation
769
+ dataset:
770
+ name: gsarti/flores_101_khm
771
+ type: gsarti/flores_101_khm
772
+ metrics:
773
+ - name: byte_perplexity
774
+ type: byte_perplexity
775
+ value: 3.366514710252477
776
+ verified: false
777
+ - task:
778
+ type: text-generation
779
+ name: text generation
780
+ dataset:
781
+ name: gsarti/flores_101_kir
782
+ type: gsarti/flores_101_kir
783
+ metrics:
784
+ - name: byte_perplexity
785
+ type: byte_perplexity
786
+ value: 3.2413845359487885
787
+ verified: false
788
+ - task:
789
+ type: text-generation
790
+ name: text generation
791
+ dataset:
792
+ name: gsarti/flores_101_kor
793
+ type: gsarti/flores_101_kor
794
+ metrics:
795
+ - name: byte_perplexity
796
+ type: byte_perplexity
797
+ value: 2.9023196482741027
798
+ verified: false
799
+ - task:
800
+ type: text-generation
801
+ name: text generation
802
+ dataset:
803
+ name: gsarti/flores_101_lao
804
+ type: gsarti/flores_101_lao
805
+ metrics:
806
+ - name: byte_perplexity
807
+ type: byte_perplexity
808
+ value: 2.331446855837494
809
+ verified: false
810
+ - task:
811
+ type: text-generation
812
+ name: text generation
813
+ dataset:
814
+ name: gsarti/flores_101_lav
815
+ type: gsarti/flores_101_lav
816
+ metrics:
817
+ - name: byte_perplexity
818
+ type: byte_perplexity
819
+ value: 5.223609016485348
820
+ verified: false
821
+ - task:
822
+ type: text-generation
823
+ name: text generation
824
+ dataset:
825
+ name: gsarti/flores_101_lin
826
+ type: gsarti/flores_101_lin
827
+ metrics:
828
+ - name: byte_perplexity
829
+ type: byte_perplexity
830
+ value: 4.847471204107301
831
+ verified: false
832
+ - task:
833
+ type: text-generation
834
+ name: text generation
835
+ dataset:
836
+ name: gsarti/flores_101_lit
837
+ type: gsarti/flores_101_lit
838
+ metrics:
839
+ - name: byte_perplexity
840
+ type: byte_perplexity
841
+ value: 4.5432035498036765
842
+ verified: false
843
+ - task:
844
+ type: text-generation
845
+ name: text generation
846
+ dataset:
847
+ name: gsarti/flores_101_ltz
848
+ type: gsarti/flores_101_ltz
849
+ metrics:
850
+ - name: byte_perplexity
851
+ type: byte_perplexity
852
+ value: 5.5910516978201015
853
+ verified: false
854
+ - task:
855
+ type: text-generation
856
+ name: text generation
857
+ dataset:
858
+ name: gsarti/flores_101_lug
859
+ type: gsarti/flores_101_lug
860
+ metrics:
861
+ - name: byte_perplexity
862
+ type: byte_perplexity
863
+ value: 5.4301049946044175
864
+ verified: false
865
+ - task:
866
+ type: text-generation
867
+ name: text generation
868
+ dataset:
869
+ name: gsarti/flores_101_luo
870
+ type: gsarti/flores_101_luo
871
+ metrics:
872
+ - name: byte_perplexity
873
+ type: byte_perplexity
874
+ value: 12.031029857399394
875
+ verified: false
876
+ - task:
877
+ type: text-generation
878
+ name: text generation
879
+ dataset:
880
+ name: gsarti/flores_101_mal
881
+ type: gsarti/flores_101_mal
882
+ metrics:
883
+ - name: byte_perplexity
884
+ type: byte_perplexity
885
+ value: 4.794302548141229
886
+ verified: false
887
+ - task:
888
+ type: text-generation
889
+ name: text generation
890
+ dataset:
891
+ name: gsarti/flores_101_mar
892
+ type: gsarti/flores_101_mar
893
+ metrics:
894
+ - name: byte_perplexity
895
+ type: byte_perplexity
896
+ value: 6.856682255407709
897
+ verified: false
898
+ - task:
899
+ type: text-generation
900
+ name: text generation
901
+ dataset:
902
+ name: gsarti/flores_101_mkd
903
+ type: gsarti/flores_101_mkd
904
+ metrics:
905
+ - name: byte_perplexity
906
+ type: byte_perplexity
907
+ value: 2.3354144607382983
908
+ verified: false
909
+ - task:
910
+ type: text-generation
911
+ name: text generation
912
+ dataset:
913
+ name: gsarti/flores_101_mlt
914
+ type: gsarti/flores_101_mlt
915
+ metrics:
916
+ - name: byte_perplexity
917
+ type: byte_perplexity
918
+ value: 9.04135227904975
919
+ verified: false
920
+ - task:
921
+ type: text-generation
922
+ name: text generation
923
+ dataset:
924
+ name: gsarti/flores_101_mon
925
+ type: gsarti/flores_101_mon
926
+ metrics:
927
+ - name: byte_perplexity
928
+ type: byte_perplexity
929
+ value: 3.094907723618666
930
+ verified: false
931
+ - task:
932
+ type: text-generation
933
+ name: text generation
934
+ dataset:
935
+ name: gsarti/flores_101_mri
936
+ type: gsarti/flores_101_mri
937
+ metrics:
938
+ - name: byte_perplexity
939
+ type: byte_perplexity
940
+ value: 5.2659698341456505
941
+ verified: false
942
+ - task:
943
+ type: text-generation
944
+ name: text generation
945
+ dataset:
946
+ name: gsarti/flores_101_msa
947
+ type: gsarti/flores_101_msa
948
+ metrics:
949
+ - name: byte_perplexity
950
+ type: byte_perplexity
951
+ value: 2.2220779892820985
952
+ verified: false
953
+ - task:
954
+ type: text-generation
955
+ name: text generation
956
+ dataset:
957
+ name: gsarti/flores_101_mya
958
+ type: gsarti/flores_101_mya
959
+ metrics:
960
+ - name: byte_perplexity
961
+ type: byte_perplexity
962
+ value: 2.5229159853414433
963
+ verified: false
964
+ - task:
965
+ type: text-generation
966
+ name: text generation
967
+ dataset:
968
+ name: gsarti/flores_101_nld
969
+ type: gsarti/flores_101_nld
970
+ metrics:
971
+ - name: byte_perplexity
972
+ type: byte_perplexity
973
+ value: 2.799153089002766
974
+ verified: false
975
+ - task:
976
+ type: text-generation
977
+ name: text generation
978
+ dataset:
979
+ name: gsarti/flores_101_nob
980
+ type: gsarti/flores_101_nob
981
+ metrics:
982
+ - name: byte_perplexity
983
+ type: byte_perplexity
984
+ value: 3.628942049758715
985
+ verified: false
986
+ - task:
987
+ type: text-generation
988
+ name: text generation
989
+ dataset:
990
+ name: gsarti/flores_101_npi
991
+ type: gsarti/flores_101_npi
992
+ metrics:
993
+ - name: byte_perplexity
994
+ type: byte_perplexity
995
+ value: 6.666236527803879
996
+ verified: false
997
+ - task:
998
+ type: text-generation
999
+ name: text generation
1000
+ dataset:
1001
+ name: gsarti/flores_101_nso
1002
+ type: gsarti/flores_101_nso
1003
+ metrics:
1004
+ - name: byte_perplexity
1005
+ type: byte_perplexity
1006
+ value: 5.015319074943932
1007
+ verified: false
1008
+ - task:
1009
+ type: text-generation
1010
+ name: text generation
1011
+ dataset:
1012
+ name: gsarti/flores_101_nya
1013
+ type: gsarti/flores_101_nya
1014
+ metrics:
1015
+ - name: byte_perplexity
1016
+ type: byte_perplexity
1017
+ value: 4.938044040751036
1018
+ verified: false
1019
+ - task:
1020
+ type: text-generation
1021
+ name: text generation
1022
+ dataset:
1023
+ name: gsarti/flores_101_oci
1024
+ type: gsarti/flores_101_oci
1025
+ metrics:
1026
+ - name: byte_perplexity
1027
+ type: byte_perplexity
1028
+ value: 3.607440766288032
1029
+ verified: false
1030
+ - task:
1031
+ type: text-generation
1032
+ name: text generation
1033
+ dataset:
1034
+ name: gsarti/flores_101_orm
1035
+ type: gsarti/flores_101_orm
1036
+ metrics:
1037
+ - name: byte_perplexity
1038
+ type: byte_perplexity
1039
+ value: 11.31585044916705
1040
+ verified: false
1041
+ - task:
1042
+ type: text-generation
1043
+ name: text generation
1044
+ dataset:
1045
+ name: gsarti/flores_101_ory
1046
+ type: gsarti/flores_101_ory
1047
+ metrics:
1048
+ - name: byte_perplexity
1049
+ type: byte_perplexity
1050
+ value: 5.981891184515959
1051
+ verified: false
1052
+ - task:
1053
+ type: text-generation
1054
+ name: text generation
1055
+ dataset:
1056
+ name: gsarti/flores_101_pan
1057
+ type: gsarti/flores_101_pan
1058
+ metrics:
1059
+ - name: byte_perplexity
1060
+ type: byte_perplexity
1061
+ value: 4.7716086841502685
1062
+ verified: false
1063
+ - task:
1064
+ type: text-generation
1065
+ name: text generation
1066
+ dataset:
1067
+ name: gsarti/flores_101_pol
1068
+ type: gsarti/flores_101_pol
1069
+ metrics:
1070
+ - name: byte_perplexity
1071
+ type: byte_perplexity
1072
+ value: 3.01200174157614
1073
+ verified: false
1074
+ - task:
1075
+ type: text-generation
1076
+ name: text generation
1077
+ dataset:
1078
+ name: gsarti/flores_101_por
1079
+ type: gsarti/flores_101_por
1080
+ metrics:
1081
+ - name: byte_perplexity
1082
+ type: byte_perplexity
1083
+ value: 1.8411472115156693
1084
+ verified: false
1085
+ - task:
1086
+ type: text-generation
1087
+ name: text generation
1088
+ dataset:
1089
+ name: gsarti/flores_101_pus
1090
+ type: gsarti/flores_101_pus
1091
+ metrics:
1092
+ - name: byte_perplexity
1093
+ type: byte_perplexity
1094
+ value: 4.623872921169341
1095
+ verified: false
1096
+ - task:
1097
+ type: text-generation
1098
+ name: text generation
1099
+ dataset:
1100
+ name: gsarti/flores_101_ron
1101
+ type: gsarti/flores_101_ron
1102
+ metrics:
1103
+ - name: byte_perplexity
1104
+ type: byte_perplexity
1105
+ value: 3.049829411973529
1106
+ verified: false
1107
+ - task:
1108
+ type: text-generation
1109
+ name: text generation
1110
+ dataset:
1111
+ name: gsarti/flores_101_rus
1112
+ type: gsarti/flores_101_rus
1113
+ metrics:
1114
+ - name: byte_perplexity
1115
+ type: byte_perplexity
1116
+ value: 1.7083443875791493
1117
+ verified: false
1118
+ - task:
1119
+ type: text-generation
1120
+ name: text generation
1121
+ dataset:
1122
+ name: gsarti/flores_101_slk
1123
+ type: gsarti/flores_101_slk
1124
+ metrics:
1125
+ - name: byte_perplexity
1126
+ type: byte_perplexity
1127
+ value: 4.037719650548048
1128
+ verified: false
1129
+ - task:
1130
+ type: text-generation
1131
+ name: text generation
1132
+ dataset:
1133
+ name: gsarti/flores_101_slv
1134
+ type: gsarti/flores_101_slv
1135
+ metrics:
1136
+ - name: byte_perplexity
1137
+ type: byte_perplexity
1138
+ value: 4.141036287764831
1139
+ verified: false
1140
+ - task:
1141
+ type: text-generation
1142
+ name: text generation
1143
+ dataset:
1144
+ name: gsarti/flores_101_sna
1145
+ type: gsarti/flores_101_sna
1146
+ metrics:
1147
+ - name: byte_perplexity
1148
+ type: byte_perplexity
1149
+ value: 4.7109183690601295
1150
+ verified: false
1151
+ - task:
1152
+ type: text-generation
1153
+ name: text generation
1154
+ dataset:
1155
+ name: gsarti/flores_101_snd
1156
+ type: gsarti/flores_101_snd
1157
+ metrics:
1158
+ - name: byte_perplexity
1159
+ type: byte_perplexity
1160
+ value: 4.206170931541356
1161
+ verified: false
1162
+ - task:
1163
+ type: text-generation
1164
+ name: text generation
1165
+ dataset:
1166
+ name: gsarti/flores_101_som
1167
+ type: gsarti/flores_101_som
1168
+ metrics:
1169
+ - name: byte_perplexity
1170
+ type: byte_perplexity
1171
+ value: 9.154342083821405
1172
+ verified: false
1173
+ - task:
1174
+ type: text-generation
1175
+ name: text generation
1176
+ dataset:
1177
+ name: gsarti/flores_101_spa
1178
+ type: gsarti/flores_101_spa
1179
+ metrics:
1180
+ - name: byte_perplexity
1181
+ type: byte_perplexity
1182
+ value: 1.7955816311143258
1183
+ verified: false
1184
+ - task:
1185
+ type: text-generation
1186
+ name: text generation
1187
+ dataset:
1188
+ name: gsarti/flores_101_srp
1189
+ type: gsarti/flores_101_srp
1190
+ metrics:
1191
+ - name: byte_perplexity
1192
+ type: byte_perplexity
1193
+ value: 2.241096141430147
1194
+ verified: false
1195
+ - task:
1196
+ type: text-generation
1197
+ name: text generation
1198
+ dataset:
1199
+ name: gsarti/flores_101_swe
1200
+ type: gsarti/flores_101_swe
1201
+ metrics:
1202
+ - name: byte_perplexity
1203
+ type: byte_perplexity
1204
+ value: 3.344977179674293
1205
+ verified: false
1206
+ - task:
1207
+ type: text-generation
1208
+ name: text generation
1209
+ dataset:
1210
+ name: gsarti/flores_101_swh
1211
+ type: gsarti/flores_101_swh
1212
+ metrics:
1213
+ - name: byte_perplexity
1214
+ type: byte_perplexity
1215
+ value: 2.6844272218041634
1216
+ verified: false
1217
+ - task:
1218
+ type: text-generation
1219
+ name: text generation
1220
+ dataset:
1221
+ name: gsarti/flores_101_tam
1222
+ type: gsarti/flores_101_tam
1223
+ metrics:
1224
+ - name: byte_perplexity
1225
+ type: byte_perplexity
1226
+ value: 5.1645951632801745
1227
+ verified: false
1228
+ - task:
1229
+ type: text-generation
1230
+ name: text generation
1231
+ dataset:
1232
+ name: gsarti/flores_101_tel
1233
+ type: gsarti/flores_101_tel
1234
+ metrics:
1235
+ - name: byte_perplexity
1236
+ type: byte_perplexity
1237
+ value: 6.8098996634099445
1238
+ verified: false
1239
+ - task:
1240
+ type: text-generation
1241
+ name: text generation
1242
+ dataset:
1243
+ name: gsarti/flores_101_tgk
1244
+ type: gsarti/flores_101_tgk
1245
+ metrics:
1246
+ - name: byte_perplexity
1247
+ type: byte_perplexity
1248
+ value: 3.785457016715163
1249
+ verified: false
1250
+ - task:
1251
+ type: text-generation
1252
+ name: text generation
1253
+ dataset:
1254
+ name: gsarti/flores_101_tgl
1255
+ type: gsarti/flores_101_tgl
1256
+ metrics:
1257
+ - name: byte_perplexity
1258
+ type: byte_perplexity
1259
+ value: 3.7498953645610875
1260
+ verified: false
1261
+ - task:
1262
+ type: text-generation
1263
+ name: text generation
1264
+ dataset:
1265
+ name: gsarti/flores_101_tha
1266
+ type: gsarti/flores_101_tha
1267
+ metrics:
1268
+ - name: byte_perplexity
1269
+ type: byte_perplexity
1270
+ value: 2.104151663233468
1271
+ verified: false
1272
+ - task:
1273
+ type: text-generation
1274
+ name: text generation
1275
+ dataset:
1276
+ name: gsarti/flores_101_tur
1277
+ type: gsarti/flores_101_tur
1278
+ metrics:
1279
+ - name: byte_perplexity
1280
+ type: byte_perplexity
1281
+ value: 3.3178240103796037
1282
+ verified: false
1283
+ - task:
1284
+ type: text-generation
1285
+ name: text generation
1286
+ dataset:
1287
+ name: gsarti/flores_101_ukr
1288
+ type: gsarti/flores_101_ukr
1289
+ metrics:
1290
+ - name: byte_perplexity
1291
+ type: byte_perplexity
1292
+ value: 2.088543437159643
1293
+ verified: false
1294
+ - task:
1295
+ type: text-generation
1296
+ name: text generation
1297
+ dataset:
1298
+ name: gsarti/flores_101_umb
1299
+ type: gsarti/flores_101_umb
1300
+ metrics:
1301
+ - name: byte_perplexity
1302
+ type: byte_perplexity
1303
+ value: 11.766013385445124
1304
+ verified: false
1305
+ - task:
1306
+ type: text-generation
1307
+ name: text generation
1308
+ dataset:
1309
+ name: gsarti/flores_101_urd
1310
+ type: gsarti/flores_101_urd
1311
+ metrics:
1312
+ - name: byte_perplexity
1313
+ type: byte_perplexity
1314
+ value: 1.7788699847612357
1315
+ verified: false
1316
+ - task:
1317
+ type: text-generation
1318
+ name: text generation
1319
+ dataset:
1320
+ name: gsarti/flores_101_uzb
1321
+ type: gsarti/flores_101_uzb
1322
+ metrics:
1323
+ - name: byte_perplexity
1324
+ type: byte_perplexity
1325
+ value: 8.499879863290486
1326
+ verified: false
1327
+ - task:
1328
+ type: text-generation
1329
+ name: text generation
1330
+ dataset:
1331
+ name: gsarti/flores_101_vie
1332
+ type: gsarti/flores_101_vie
1333
+ metrics:
1334
+ - name: byte_perplexity
1335
+ type: byte_perplexity
1336
+ value: 1.65901207387262
1337
+ verified: false
1338
+ - task:
1339
+ type: text-generation
1340
+ name: text generation
1341
+ dataset:
1342
+ name: gsarti/flores_101_wol
1343
+ type: gsarti/flores_101_wol
1344
+ metrics:
1345
+ - name: byte_perplexity
1346
+ type: byte_perplexity
1347
+ value: 6.141703791276928
1348
+ verified: false
1349
+ - task:
1350
+ type: text-generation
1351
+ name: text generation
1352
+ dataset:
1353
+ name: gsarti/flores_101_xho
1354
+ type: gsarti/flores_101_xho
1355
+ metrics:
1356
+ - name: byte_perplexity
1357
+ type: byte_perplexity
1358
+ value: 4.690199677955254
1359
+ verified: false
1360
+ - task:
1361
+ type: text-generation
1362
+ name: text generation
1363
+ dataset:
1364
+ name: gsarti/flores_101_yor
1365
+ type: gsarti/flores_101_yor
1366
+ metrics:
1367
+ - name: byte_perplexity
1368
+ type: byte_perplexity
1369
+ value: 4.360585696242932
1370
+ verified: false
1371
+ - task:
1372
+ type: text-generation
1373
+ name: text generation
1374
+ dataset:
1375
+ name: gsarti/flores_101_zho_simpl
1376
+ type: gsarti/flores_101_zho_simpl
1377
+ metrics:
1378
+ - name: byte_perplexity
1379
+ type: byte_perplexity
1380
+ value: 2.1183545781883515
1381
+ verified: false
1382
+ - task:
1383
+ type: text-generation
1384
+ name: text generation
1385
+ dataset:
1386
+ name: gsarti/flores_101_zho_trad
1387
+ type: gsarti/flores_101_zho_trad
1388
+ metrics:
1389
+ - name: byte_perplexity
1390
+ type: byte_perplexity
1391
+ value: 2.273787884962656
1392
+ verified: false
1393
+ - task:
1394
+ type: text-generation
1395
+ name: text generation
1396
+ dataset:
1397
+ name: gsarti/flores_101_zul
1398
+ type: gsarti/flores_101_zul
1399
+ metrics:
1400
+ - name: byte_perplexity
1401
+ type: byte_perplexity
1402
+ value: 6.016954767729589
1403
+ verified: false
1404
+ - task:
1405
+ type: text-generation
1406
+ name: text generation
1407
+ dataset:
1408
+ name: headqa
1409
+ type: headqa
1410
+ metrics:
1411
+ - name: acc
1412
+ type: acc
1413
+ value: 0.3464624361779723
1414
+ verified: false
1415
+ - task:
1416
+ type: text-generation
1417
+ name: text generation
1418
+ dataset:
1419
+ name: hellaswag
1420
+ type: hellaswag
1421
+ metrics:
1422
+ - name: acc
1423
+ type: acc
1424
+ value: 0.5353515236008763
1425
+ verified: false
1426
+ - task:
1427
+ type: text-generation
1428
+ name: text generation
1429
+ dataset:
1430
+ name: lambada_mt_de
1431
+ type: lambada_mt_de
1432
+ metrics:
1433
+ - name: acc
1434
+ type: acc
1435
+ value: 0.3291286629148069
1436
+ verified: false
1437
+ - task:
1438
+ type: text-generation
1439
+ name: text generation
1440
+ dataset:
1441
+ name: lambada_mt_en
1442
+ type: lambada_mt_en
1443
+ metrics:
1444
+ - name: acc
1445
+ type: acc
1446
+ value: 0.6720357073549389
1447
+ verified: false
1448
+ - task:
1449
+ type: text-generation
1450
+ name: text generation
1451
+ dataset:
1452
+ name: lambada_mt_es
1453
+ type: lambada_mt_es
1454
+ metrics:
1455
+ - name: acc
1456
+ type: acc
1457
+ value: 0.476421502037648
1458
+ verified: false
1459
+ - task:
1460
+ type: text-generation
1461
+ name: text generation
1462
+ dataset:
1463
+ name: lambada_mt_it
1464
+ type: lambada_mt_it
1465
+ metrics:
1466
+ - name: acc
1467
+ type: acc
1468
+ value: 0.4061711624296526
1469
+ verified: false
1470
+ - task:
1471
+ type: text-generation
1472
+ name: text generation
1473
+ dataset:
1474
+ name: logiqa
1475
+ type: logiqa
1476
+ metrics:
1477
+ - name: acc
1478
+ type: acc
1479
+ value: 0.2350230414746544
1480
+ verified: false
1481
+ - task:
1482
+ type: text-generation
1483
+ name: text generation
1484
+ dataset:
1485
+ name: mathqa
1486
+ type: mathqa
1487
+ metrics:
1488
+ - name: acc
1489
+ type: acc
1490
+ value: 0.27671691792294806
1491
+ verified: false
1492
+ - task:
1493
+ type: text-generation
1494
+ name: text generation
1495
+ dataset:
1496
+ name: mc_taco
1497
+ type: mc_taco
1498
+ metrics:
1499
+ - name: em
1500
+ type: em
1501
+ value: 0.13063063063063063
1502
+ verified: false
1503
+ - task:
1504
+ type: text-generation
1505
+ name: text generation
1506
+ dataset:
1507
+ name: mnli
1508
+ type: mnli
1509
+ metrics:
1510
+ - name: acc
1511
+ type: acc
1512
+ value: 0.3545565500406835
1513
+ verified: false
1514
+ - task:
1515
+ type: text-generation
1516
+ name: text generation
1517
+ dataset:
1518
+ name: mnli_mismatched
1519
+ type: mnli_mismatched
1520
+ metrics:
1521
+ - name: acc
1522
+ type: acc
1523
+ value: 0.3545565500406835
1524
+ verified: false
1525
+ - task:
1526
+ type: text-generation
1527
+ name: text generation
1528
+ dataset:
1529
+ name: mrpc
1530
+ type: mrpc
1531
+ metrics:
1532
+ - name: acc
1533
+ type: acc
1534
+ value: 0.3872549019607843
1535
+ verified: false
1536
+ - task:
1537
+ type: text-generation
1538
+ name: text generation
1539
+ dataset:
1540
+ name: multirc
1541
+ type: multirc
1542
+ metrics:
1543
+ - name: acc
1544
+ type: acc
1545
+ value: 0.570957095709571
1546
+ verified: false
1547
+ - task:
1548
+ type: text-generation
1549
+ name: text generation
1550
+ dataset:
1551
+ name: openbookqa
1552
+ type: openbookqa
1553
+ metrics:
1554
+ - name: acc
1555
+ type: acc
1556
+ value: 0.312
1557
+ verified: false
1558
+ - task:
1559
+ type: text-generation
1560
+ name: text generation
1561
+ dataset:
1562
+ name: piqa
1563
+ type: piqa
1564
+ metrics:
1565
+ - name: acc
1566
+ type: acc
1567
+ value: 0.7812840043525572
1568
+ verified: false
1569
+ - task:
1570
+ type: text-generation
1571
+ name: text generation
1572
+ dataset:
1573
+ name: prost
1574
+ type: prost
1575
+ metrics:
1576
+ - name: acc
1577
+ type: acc
1578
+ value: 0.2977156276686593
1579
+ verified: false
1580
+ - task:
1581
+ type: text-generation
1582
+ name: text generation
1583
+ dataset:
1584
+ name: pubmedqa
1585
+ type: pubmedqa
1586
+ metrics:
1587
+ - name: acc
1588
+ type: acc
1589
+ value: 0.741
1590
+ verified: false
1591
+ - task:
1592
+ type: text-generation
1593
+ name: text generation
1594
+ dataset:
1595
+ name: qnli
1596
+ type: qnli
1597
+ metrics:
1598
+ - name: acc
1599
+ type: acc
1600
+ value: 0.5172981878088962
1601
+ verified: false
1602
+ - task:
1603
+ type: text-generation
1604
+ name: text generation
1605
+ dataset:
1606
+ name: qqp
1607
+ type: qqp
1608
+ metrics:
1609
+ - name: acc
1610
+ type: acc
1611
+ value: 0.5883007667573584
1612
+ verified: false
1613
+ - task:
1614
+ type: text-generation
1615
+ name: text generation
1616
+ dataset:
1617
+ name: race
1618
+ type: race
1619
+ metrics:
1620
+ - name: acc
1621
+ type: acc
1622
+ value: 0.39043062200956935
1623
+ verified: false
1624
+ - task:
1625
+ type: text-generation
1626
+ name: text generation
1627
+ dataset:
1628
+ name: rte
1629
+ type: rte
1630
+ metrics:
1631
+ - name: acc
1632
+ type: acc
1633
+ value: 0.5198555956678701
1634
+ verified: false
1635
+ - task:
1636
+ type: text-generation
1637
+ name: text generation
1638
+ dataset:
1639
+ name: sciq
1640
+ type: sciq
1641
+ metrics:
1642
+ - name: acc
1643
+ type: acc
1644
+ value: 0.936
1645
+ verified: false
1646
+ - task:
1647
+ type: text-generation
1648
+ name: text generation
1649
+ dataset:
1650
+ name: sst
1651
+ type: sst
1652
+ metrics:
1653
+ - name: acc
1654
+ type: acc
1655
+ value: 0.6043577981651376
1656
+ verified: false
1657
+ - task:
1658
+ type: text-generation
1659
+ name: text generation
1660
+ dataset:
1661
+ name: triviaqa
1662
+ type: triviaqa
1663
+ metrics:
1664
+ - name: acc
1665
+ type: acc
1666
+ value: 0.18332891363917617
1667
+ verified: false
1668
+ - task:
1669
+ type: text-generation
1670
+ name: text generation
1671
+ dataset:
1672
+ name: tydiqa_primary
1673
+ type: tydiqa_primary
1674
+ metrics:
1675
+ - name: acc
1676
+ type: acc
1677
+ value: 0.2809817301342725
1678
+ verified: false
1679
+ - task:
1680
+ type: text-generation
1681
+ name: text generation
1682
+ dataset:
1683
+ name: webqs
1684
+ type: webqs
1685
+ metrics:
1686
+ - name: acc
1687
+ type: acc
1688
+ value: 0.061515748031496065
1689
+ verified: false
1690
+ - task:
1691
+ type: text-generation
1692
+ name: text generation
1693
+ dataset:
1694
+ name: wic
1695
+ type: wic
1696
+ metrics:
1697
+ - name: acc
1698
+ type: acc
1699
+ value: 0.5062695924764891
1700
+ verified: false
1701
+ - task:
1702
+ type: text-generation
1703
+ name: text generation
1704
+ dataset:
1705
+ name: winogrande
1706
+ type: winogrande
1707
+ metrics:
1708
+ - name: acc
1709
+ type: acc
1710
+ value: 0.7095501183898973
1711
+ verified: false
1712
+ - task:
1713
+ type: text-generation
1714
+ name: text generation
1715
+ dataset:
1716
+ name: wnli
1717
+ type: wnli
1718
+ metrics:
1719
+ - name: acc
1720
+ type: acc
1721
+ value: 0.5704225352112676
1722
+ verified: false
1723
+ - task:
1724
+ type: text-generation
1725
+ name: text generation
1726
+ dataset:
1727
+ name: wsc
1728
+ type: wsc
1729
+ metrics:
1730
+ - name: acc
1731
+ type: acc
1732
+ value: 0.5192307692307693
1733
+ verified: false
1734
+ - task:
1735
+ type: text-generation
1736
+ name: text generation
1737
+ dataset:
1738
+ name: humaneval
1739
+ type: humaneval
1740
+ metrics:
1741
+ - name: pass@1
1742
+ type: pass@1
1743
+ value: 0.15524390243902436
1744
+ verified: false
1745
+ - name: pass@10
1746
+ type: pass@10
1747
+ value: 0.3220367632383857
1748
+ verified: false
1749
+ - name: pass@100
1750
+ type: pass@100
1751
+ value: 0.5545431515723145
1752
+ verified: false
1753
  ---
1754
 
1755
  <img src="https://s3.amazonaws.com/moonup/production/uploads/1657124309515-5f17f0a0925b9863e28ad517.png" alt="BigScience Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
2154
  And multiple different metrics for specific tasks. _(More evaluation metrics forthcoming upon completion of evaluation protocol.)_
2155
 
2156
  ## Factors
2157
+ *This section lists some different aspects of BLOOM models. Its focus is on aspects that are likely to give rise to high variance in model behavior.*
2158
 
2159
  - Language, such as English or Yoruba
2160
 
2161
  - Domain, such as newswire or stories
2162
+
2163
  - Demographic characteristics, such as gender or nationality
2164
 
2165
  ## Results
2166
  *Results are based on the [Factors](#factors) and [Metrics](#metrics).*
2167
 
2168
+ **Zero-shot evaluations:**
2169
+
2170
+ See this repository for JSON files: https://github.com/bigscience-workshop/evaluation-results
2171
+
2172
+ | Task | Language | Metric | BLOOM-176B | OPT-175B* |
2173
+ |:--------|:-----------------|:------------------------|-------------:|------------:|
2174
+ | arc_challenge | eng | acc ↑ | 0.411 | 0.412 |
2175
+ | arc_easy | eng | acc ↑ | 0.726 | 0.751 |
2176
+ | axb (Median of 10 prompts) | eng | acc ↑ | 0.575 | 0.532 |
2177
+ | axg (Median of 10 prompts) | eng | acc ↑ | 0.525 | 0.548 |
2178
+ | boolq (Median of 11 prompts) | eng | acc ↑ | 0.635 | 0.622 |
2179
+ | cb (Median of 15 prompts) | eng | acc ↑ | 0.339 | 0.411 |
2180
+ | cola (Median of 5 prompts) | eng | acc ↑ | 0.39 | 0.444 |
2181
+ | copa (Median of 9 prompts) | eng | acc ↑ | 0.56 | 0.55 |
2182
+ | crows_pairs_english (Median of 6 prompts) | eng | acc ↑ | 0.5 | 0.502 |
2183
+ | crows_pairs_french (Median of 7 prompts) | fra | acc ↑ | 0.506 | 0.499 |
2184
+ | diabla (Median of 2 prompts) | eng | acc ↑ | 0.295 | 0.289 |
2185
+ | gsarti/flores_101_afr | afr | byte_perplexity ↓ | 4.254 | 3.381 |
2186
+ | gsarti/flores_101_amh | amh | byte_perplexity ↓ | 3.717 | 3.87 |
2187
+ | gsarti/flores_101_ara | ara | byte_perplexity ↓ | 1.705 | 2.42 |
2188
+ | gsarti/flores_101_asm | asm | byte_perplexity ↓ | 6.577 | 3.028 |
2189
+ | gsarti/flores_101_ast | ast | byte_perplexity ↓ | 2.856 | 4.737 |
2190
+ | gsarti/flores_101_azj | azj | byte_perplexity ↓ | 4.807 | 4.767 |
2191
+ | gsarti/flores_101_bel | bel | byte_perplexity ↓ | 2.731 | 2.557 |
2192
+ | gsarti/flores_101_ben | ben | byte_perplexity ↓ | 5.993 | 2.243 |
2193
+ | gsarti/flores_101_bos | bos | byte_perplexity ↓ | 3.594 | 2.668 |
2194
+ | gsarti/flores_101_bul | bul | byte_perplexity ↓ | 2.159 | 2.099 |
2195
+ | gsarti/flores_101_cat | cat | byte_perplexity ↓ | 2.168 | 2.837 |
2196
+ | gsarti/flores_101_ceb | ceb | byte_perplexity ↓ | 5.287 | 3.636 |
2197
+ | gsarti/flores_101_ces | ces | byte_perplexity ↓ | 3.452 | 2.749 |
2198
+ | gsarti/flores_101_ckb | ckb | byte_perplexity ↓ | 3.705 | 4.688 |
2199
+ | gsarti/flores_101_cym | cym | byte_perplexity ↓ | 7.089 | 5.075 |
2200
+ | gsarti/flores_101_dan | dan | byte_perplexity ↓ | 3.43 | 2.492 |
2201
+ | gsarti/flores_101_deu | deu | byte_perplexity ↓ | 2.338 | 2.099 |
2202
+ | gsarti/flores_101_ell | ell | byte_perplexity ↓ | 1.96 | 1.811 |
2203
+ | gsarti/flores_101_eng | eng | byte_perplexity ↓ | 1.882 | 1.9 |
2204
+ | gsarti/flores_101_est | est | byte_perplexity ↓ | 5.774 | 3.533 |
2205
+ | gsarti/flores_101_fas | fas | byte_perplexity ↓ | 2.431 | 2.444 |
2206
+ | gsarti/flores_101_fin | fin | byte_perplexity ↓ | 4.304 | 2.601 |
2207
+ | gsarti/flores_101_fra | fra | byte_perplexity ↓ | 1.937 | 1.984 |
2208
+ | gsarti/flores_101_ful | ful | byte_perplexity ↓ | 9.74 | 11.84 |
2209
+ | gsarti/flores_101_gle | gle | byte_perplexity ↓ | 6.035 | 3.914 |
2210
+ | gsarti/flores_101_glg | glg | byte_perplexity ↓ | 2.365 | 3.015 |
2211
+ | gsarti/flores_101_guj | guj | byte_perplexity ↓ | 5.707 | 2.438 |
2212
+ | gsarti/flores_101_hau | hau | byte_perplexity ↓ | 8.855 | 5.283 |
2213
+ | gsarti/flores_101_heb | heb | byte_perplexity ↓ | 2.921 | 2.903 |
2214
+ | gsarti/flores_101_hin | hin | byte_perplexity ↓ | 5.452 | 1.86 |
2215
+ | gsarti/flores_101_hrv | hrv | byte_perplexity ↓ | 3.706 | 2.715 |
2216
+ | gsarti/flores_101_hun | hun | byte_perplexity ↓ | 4.059 | 2.865 |
2217
+ | gsarti/flores_101_hye | hye | byte_perplexity ↓ | 3.127 | 3.411 |
2218
+ | gsarti/flores_101_ibo | ibo | byte_perplexity ↓ | 3.95 | 8.008 |
2219
+ | gsarti/flores_101_ind | ind | byte_perplexity ↓ | 1.976 | 2.632 |
2220
+ | gsarti/flores_101_isl | isl | byte_perplexity ↓ | 5.501 | 4.701 |
2221
+ | gsarti/flores_101_ita | ita | byte_perplexity ↓ | 2.314 | 2.104 |
2222
+ | gsarti/flores_101_jav | jav | byte_perplexity ↓ | 4.942 | 8.16 |
2223
+ | gsarti/flores_101_jpn | jpn | byte_perplexity ↓ | 2.259 | 2.198 |
2224
+ | gsarti/flores_101_kam | kam | byte_perplexity ↓ | 9.743 | 10.981 |
2225
+ | gsarti/flores_101_kan | kan | byte_perplexity ↓ | 6.234 | 2.373 |
2226
+ | gsarti/flores_101_kat | kat | byte_perplexity ↓ | 2.051 | 2.466 |
2227
+ | gsarti/flores_101_kaz | kaz | byte_perplexity ↓ | 3.039 | 4.376 |
2228
+ | gsarti/flores_101_kea | kea | byte_perplexity ↓ | 7.147 | 9.632 |
2229
+ | gsarti/flores_101_khm | khm | byte_perplexity ↓ | 3.367 | 2.646 |
2230
+ | gsarti/flores_101_kir | kir | byte_perplexity ↓ | 3.241 | 4.522 |
2231
+ | gsarti/flores_101_kor | kor | byte_perplexity ↓ | 2.902 | 3.376 |
2232
+ | gsarti/flores_101_lao | lao | byte_perplexity ↓ | 2.331 | 3.106 |
2233
+ | gsarti/flores_101_lav | lav | byte_perplexity ↓ | 5.224 | 4.811 |
2234
+ | gsarti/flores_101_lin | lin | byte_perplexity ↓ | 4.847 | 8.871 |
2235
+ | gsarti/flores_101_lit | lit | byte_perplexity ↓ | 4.543 | 5.183 |
2236
+ | gsarti/flores_101_ltz | ltz | byte_perplexity ↓ | 5.591 | 7.158 |
2237
+ | gsarti/flores_101_lug | lug | byte_perplexity ↓ | 5.43 | 7.399 |
2238
+ | gsarti/flores_101_luo | luo | byte_perplexity ↓ | 12.031 | 11.951 |
2239
+ | gsarti/flores_101_mal | mal | byte_perplexity ↓ | 4.794 | 2.054 |
2240
+ | gsarti/flores_101_mar | mar | byte_perplexity ↓ | 6.857 | 2.274 |
2241
+ | gsarti/flores_101_mkd | mkd | byte_perplexity ↓ | 2.335 | 2.538 |
2242
+ | gsarti/flores_101_mlt | mlt | byte_perplexity ↓ | 9.041 | 5.996 |
2243
+ | gsarti/flores_101_mon | mon | byte_perplexity ↓ | 3.095 | 4.519 |
2244
+ | gsarti/flores_101_mri | mri | byte_perplexity ↓ | 5.266 | 4.438 |
2245
+ | gsarti/flores_101_msa | msa | byte_perplexity ↓ | 2.222 | 2.935 |
2246
+ | gsarti/flores_101_mya | mya | byte_perplexity ↓ | 2.523 | 2.413 |
2247
+ | gsarti/flores_101_nld | nld | byte_perplexity ↓ | 2.799 | 2.293 |
2248
+ | gsarti/flores_101_nob | nob | byte_perplexity ↓ | 3.629 | 2.593 |
2249
+ | gsarti/flores_101_npi | npi | byte_perplexity ↓ | 6.666 | 2.499 |
2250
+ | gsarti/flores_101_nso | nso | byte_perplexity ↓ | 5.015 | 8.485 |
2251
+ | gsarti/flores_101_nya | nya | byte_perplexity ↓ | 4.938 | 7.548 |
2252
+ | gsarti/flores_101_oci | oci | byte_perplexity ↓ | 3.607 | 4.936 |
2253
+ | gsarti/flores_101_orm | orm | byte_perplexity ↓ | 11.316 | 7.145 |
2254
+ | gsarti/flores_101_ory | ory | byte_perplexity ↓ | 5.982 | 2.668 |
2255
+ | gsarti/flores_101_pan | pan | byte_perplexity ↓ | 4.772 | 2.782 |
2256
+ | gsarti/flores_101_pol | pol | byte_perplexity ↓ | 3.012 | 2.432 |
2257
+ | gsarti/flores_101_por | por | byte_perplexity ↓ | 1.841 | 2.178 |
2258
+ | gsarti/flores_101_pus | pus | byte_perplexity ↓ | 4.624 | 4.785 |
2259
+ | gsarti/flores_101_ron | ron | byte_perplexity ↓ | 3.05 | 2.197 |
2260
+ | gsarti/flores_101_rus | rus | byte_perplexity ↓ | 1.708 | 1.689 |
2261
+ | gsarti/flores_101_slk | slk | byte_perplexity ↓ | 4.038 | 3.419 |
2262
+ | gsarti/flores_101_slv | slv | byte_perplexity ↓ | 4.141 | 3.582 |
2263
+ | gsarti/flores_101_sna | sna | byte_perplexity ↓ | 4.711 | 5.588 |
2264
+ | gsarti/flores_101_snd | snd | byte_perplexity ↓ | 4.206 | 5.667 |
2265
+ | gsarti/flores_101_som | som | byte_perplexity ↓ | 9.154 | 4.788 |
2266
+ | gsarti/flores_101_spa | spa | byte_perplexity ↓ | 1.796 | 2.098 |
2267
+ | gsarti/flores_101_srp | srp | byte_perplexity ↓ | 2.241 | 2.688 |
2268
+ | gsarti/flores_101_swe | swe | byte_perplexity ↓ | 3.345 | 2.468 |
2269
+ | gsarti/flores_101_swh | swh | byte_perplexity ↓ | 2.684 | 4.473 |
2270
+ | gsarti/flores_101_tam | tam | byte_perplexity ↓ | 5.165 | 2.024 |
2271
+ | gsarti/flores_101_tel | tel | byte_perplexity ↓ | 6.81 | 2.407 |
2272
+ | gsarti/flores_101_tgk | tgk | byte_perplexity ↓ | 3.785 | 4.899 |
2273
+ | gsarti/flores_101_tgl | tgl | byte_perplexity ↓ | 3.75 | 2.738 |
2274
+ | gsarti/flores_101_tha | tha | byte_perplexity ↓ | 2.104 | 2.035 |
2275
+ | gsarti/flores_101_tur | tur | byte_perplexity ↓ | 3.318 | 2.622 |
2276
+ | gsarti/flores_101_ukr | ukr | byte_perplexity ↓ | 2.089 | 1.93 |
2277
+ | gsarti/flores_101_umb | umb | byte_perplexity ↓ | 11.766 | 11.64 |
2278
+ | gsarti/flores_101_urd | urd | byte_perplexity ↓ | 1.779 | 2.982 |
2279
+ | gsarti/flores_101_uzb | uzb | byte_perplexity ↓ | 8.5 | 13.209 |
2280
+ | gsarti/flores_101_vie | vie | byte_perplexity ↓ | 1.659 | 2.229 |
2281
+ | gsarti/flores_101_wol | wol | byte_perplexity ↓ | 6.142 | 13.945 |
2282
+ | gsarti/flores_101_xho | xho | byte_perplexity ↓ | 4.69 | 8.42 |
2283
+ | gsarti/flores_101_yor | yor | byte_perplexity ↓ | 4.361 | 7.636 |
2284
+ | gsarti/flores_101_zho_simpl | zho_simpl | byte_perplexity ↓ | 2.118 | 5.113 |
2285
+ | gsarti/flores_101_zho_trad | zho_trad | byte_perplexity ↓ | 2.274 | 5.67 |
2286
+ | gsarti/flores_101_zul | zul | byte_perplexity ↓ | 6.017 | 7.341 |
2287
+ | headqa | esp | acc ↑ | 0.346 | 0.244 |
2288
+ | hellaswag | eng | acc ↑ | 0.535 | 0.592 |
2289
+ | lambada_mt_de | deu | acc ↑ | 0.329 | 0.358 |
2290
+ | lambada_mt_en | eng | acc ↑ | 0.672 | 0.747 |
2291
+ | lambada_mt_es | esp | acc ↑ | 0.476 | 0.397 |
2292
+ | lambada_mt_it | ita | acc ↑ | 0.406 | 0.409 |
2293
+ | logiqa | eng | acc ↑ | 0.235 | 0.244 |
2294
+ | mathqa | eng | acc ↑ | 0.277 | 0.268 |
2295
+ | mc_taco | eng | em ↑ | 0.131 | 0.124 |
2296
+ | mnli (Median of 15 prompts) | eng | acc ↑ | 0.355 | 0.36 |
2297
+ | mnli_mismatched (Median of 15 prompts) | eng | acc ↑ | 0.355 | 0.36 |
2298
+ | mrpc | eng | acc ↑ | 0.387 | 0.446 |
2299
+ | multirc (Median of 11 prompts) | eng | acc ↑ | 0.571 | 0.599 |
2300
+ | openbookqa | eng | acc ↑ | 0.312 | 0.322 |
2301
+ | piqa | eng | acc ↑ | 0.781 | 0.791 |
2302
+ | prost | eng | acc ↑ | 0.298 | 0.299 |
2303
+ | pubmedqa | eng | acc ↑ | 0.741 | 0.709 |
2304
+ | qnli | eng | acc ↑ | 0.517 | 0.554 |
2305
+ | qqp (Median of 7 prompts) | eng | acc ↑ | 0.588 | 0.395 |
2306
+ | race | eng | acc ↑ | 0.39 | 0.402 |
2307
+ | rte (Median of 6 prompts) | eng | acc ↑ | 0.52 | 0.495 |
2308
+ | sciq | eng | acc ↑ | 0.936 | 0.948 |
2309
+ | sst (Median of 6 prompts) | eng | acc ↑ | 0.604 | 0.647 |
2310
+ | triviaqa | eng | acc ↑ | 0.183 | 0.342 |
2311
+ | tydiqa_primary (Median of 16 prompts) | eng | acc ↑ | 0.281 | 0.148 |
2312
+ | webqs | eng | acc ↑ | 0.062 | 0.159 |
2313
+ | wic (Median of 11 prompts) | eng | acc ↑ | 0.506 | 0.498 |
2314
+ | winogrande | eng | acc ↑ | 0.71 | 0.736 |
2315
+ | wnli (Median of 6 prompts) | eng | acc ↑ | 0.57 | 0.563 |
2316
+ | wsc (Median of 11 prompts) | eng | acc ↑ | 0.519 | 0.413 |
2317
+ | humaneval | python | pass@1 | 0.155 | 0.0 |
2318
+ | humaneval | python | pass@10 | 0.322 | 0.0 |
2319
+ | humaneval | python | pass@100 | 0.555 | 0.003 |
2320
+
2321
+
2322
  **Train-time Evaluation:**
2323
 
2324
+ Final checkpoint after 95K steps:
2325
 
2326
+ - Training Loss: 1.939
2327
 
2328
+ - Validation Loss: 2.061
2329
 
2330
+ - Perplexity: 7.045
2331
 
2332
+ For more see: https://huggingface.co/bigscience/tr11-176B-ml-logs
2333
 
2334
  </details>
2335
 
2424
  # Model Card Authors
2425
  *Ordered roughly chronologically and by amount of time spent.*
2426
 
2427
+ Margaret Mitchell, Giada Pistilli, Yacine Jernite, Ezinwanne Ozoani, Marissa Gerchick, Nazneen Rajani, Sasha Luccioni, Irene Solaiman, Maraim Masoud, Somaieh Nikpoor, Carlos Muñoz Ferrandis, Stas Bekman, Christopher Akiki, Danish Contractor, David Lansky, Angelina McMillan-Major, Tristan Thrush, Suzana Ilić, Gérard Dupont, Shayne Longpre, Manan Dey, Stella Biderman, Douwe Kiela, Emi Baylor, Teven Le Scao, Aaron Gokaslan, Julien Launay, Niklas Muennighoff