Muennighoff commited on
Commit
05d6b62
1 Parent(s): ad0c46c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +567 -0
README.md CHANGED
@@ -81,6 +81,573 @@ widget:
81
  example_title: "es-en fable"
82
  - text: "Write a fable about wood elves living in a forest that is suddenly invaded by ogres. The fable is a masterpiece that has achieved praise worldwide and its moral is \"Violence is the last refuge of the incompetent\". Fable (in Hindi):"
83
  example_title: "hi-en fable"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
  ---
85
 
86
  ![xmtf](https://github.com/bigscience-workshop/xmtf/blob/master/xmtf_banner.png?raw=true)
81
  example_title: "es-en fable"
82
  - text: "Write a fable about wood elves living in a forest that is suddenly invaded by ogres. The fable is a masterpiece that has achieved praise worldwide and its moral is \"Violence is the last refuge of the incompetent\". Fable (in Hindi):"
83
  example_title: "hi-en fable"
84
+ model-index:
85
+ - name: bloomz
86
+ results:
87
+ - task:
88
+ type: Coreference resolution
89
+ dataset:
90
+ type: winogrande
91
+ name: Winogrande XL
92
+ config: xl
93
+ split: validation
94
+ revision: a80f460359d1e9a67c006011c94de42a8759430c
95
+ metrics:
96
+ - type: Accuracy
97
+ value: 59.27
98
+ - task:
99
+ type: Coreference resolution
100
+ dataset:
101
+ type: Muennighoff/xwinograd
102
+ name: XWinograd
103
+ config: en
104
+ split: test
105
+ revision: 9dd5ea5505fad86b7bedad667955577815300cee
106
+ metrics:
107
+ - type: Accuracy
108
+ value: 69.08
109
+ - task:
110
+ type: Coreference resolution
111
+ dataset:
112
+ type: Muennighoff/xwinograd
113
+ name: XWinograd
114
+ config: fr
115
+ split: test
116
+ revision: 9dd5ea5505fad86b7bedad667955577815300cee
117
+ metrics:
118
+ - type: Accuracy
119
+ value: 68.67
120
+ - task:
121
+ type: Coreference resolution
122
+ dataset:
123
+ type: Muennighoff/xwinograd
124
+ name: XWinograd
125
+ config: jp
126
+ split: test
127
+ revision: 9dd5ea5505fad86b7bedad667955577815300cee
128
+ metrics:
129
+ - type: Accuracy
130
+ value: 59.65
131
+ - task:
132
+ type: Coreference resolution
133
+ dataset:
134
+ type: Muennighoff/xwinograd
135
+ name: XWinograd
136
+ config: pt
137
+ split: test
138
+ revision: 9dd5ea5505fad86b7bedad667955577815300cee
139
+ metrics:
140
+ - type: Accuracy
141
+ value: 64.26
142
+ - task:
143
+ type: Coreference resolution
144
+ dataset:
145
+ type: Muennighoff/xwinograd
146
+ name: XWinograd
147
+ config: ru
148
+ split: test
149
+ revision: 9dd5ea5505fad86b7bedad667955577815300cee
150
+ metrics:
151
+ - type: Accuracy
152
+ value: 60.95
153
+ - task:
154
+ type: Coreference resolution
155
+ dataset:
156
+ type: Muennighoff/xwinograd
157
+ name: XWinograd
158
+ config: zh
159
+ split: test
160
+ revision: 9dd5ea5505fad86b7bedad667955577815300cee
161
+ metrics:
162
+ - type: Accuracy
163
+ value: 70.24
164
+ - task:
165
+ type: Natural language inference
166
+ dataset:
167
+ type: anli
168
+ name: ANLI
169
+ config: r1
170
+ split: validation
171
+ revision: 9dbd830a06fea8b1c49d6e5ef2004a08d9f45094
172
+ metrics:
173
+ - type: Accuracy
174
+ value: 48.6
175
+ - task:
176
+ type: Natural language inference
177
+ dataset:
178
+ type: anli
179
+ name: ANLI
180
+ config: r2
181
+ split: validation
182
+ revision: 9dbd830a06fea8b1c49d6e5ef2004a08d9f45094
183
+ metrics:
184
+ - type: Accuracy
185
+ value: 44.1
186
+ - task:
187
+ type: Natural language inference
188
+ dataset:
189
+ type: anli
190
+ name: ANLI
191
+ config: r3
192
+ split: validation
193
+ revision: 9dbd830a06fea8b1c49d6e5ef2004a08d9f45094
194
+ metrics:
195
+ - type: Accuracy
196
+ value: 45.5
197
+ - task:
198
+ type: Natural language inference
199
+ dataset:
200
+ type: super_glue
201
+ name: SuperGLUE
202
+ config: cb
203
+ split: validation
204
+ revision: 9e12063561e7e6c79099feb6d5a493142584e9e2
205
+ metrics:
206
+ - type: Accuracy
207
+ value: 82.14
208
+ - task:
209
+ type: Natural language inference
210
+ dataset:
211
+ type: super_glue
212
+ name: SuperGLUE
213
+ config: rte
214
+ split: validation
215
+ revision: 9e12063561e7e6c79099feb6d5a493142584e9e2
216
+ metrics:
217
+ - type: Accuracy
218
+ value: 85.56
219
+ - task:
220
+ type: Natural language inference
221
+ dataset:
222
+ type: xnli
223
+ name: XNLI
224
+ config: ar
225
+ split: validation
226
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
227
+ metrics:
228
+ - type: Accuracy
229
+ value: 60.68
230
+ - task:
231
+ type: Natural language inference
232
+ dataset:
233
+ type: xnli
234
+ name: XNLI
235
+ config: bg
236
+ split: validation
237
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
238
+ metrics:
239
+ - type: Accuracy
240
+ value: 48.43
241
+ - task:
242
+ type: Natural language inference
243
+ dataset:
244
+ type: xnli
245
+ name: XNLI
246
+ config: de
247
+ split: validation
248
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
249
+ metrics:
250
+ - type: Accuracy
251
+ value: 54.38
252
+ - task:
253
+ type: Natural language inference
254
+ dataset:
255
+ type: xnli
256
+ name: XNLI
257
+ config: el
258
+ split: validation
259
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
260
+ metrics:
261
+ - type: Accuracy
262
+ value: 47.43
263
+ - task:
264
+ type: Natural language inference
265
+ dataset:
266
+ type: xnli
267
+ name: XNLI
268
+ config: en
269
+ split: validation
270
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
271
+ metrics:
272
+ - type: Accuracy
273
+ value: 67.47
274
+ - task:
275
+ type: Natural language inference
276
+ dataset:
277
+ type: xnli
278
+ name: XNLI
279
+ config: es
280
+ split: validation
281
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
282
+ metrics:
283
+ - type: Accuracy
284
+ value: 61.24
285
+ - task:
286
+ type: Natural language inference
287
+ dataset:
288
+ type: xnli
289
+ name: XNLI
290
+ config: fr
291
+ split: validation
292
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
293
+ metrics:
294
+ - type: Accuracy
295
+ value: 61.37
296
+ - task:
297
+ type: Natural language inference
298
+ dataset:
299
+ type: xnli
300
+ name: XNLI
301
+ config: hi
302
+ split: validation
303
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
304
+ metrics:
305
+ - type: Accuracy
306
+ value: 60.2
307
+ - task:
308
+ type: Natural language inference
309
+ dataset:
310
+ type: xnli
311
+ name: XNLI
312
+ config: ru
313
+ split: validation
314
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
315
+ metrics:
316
+ - type: Accuracy
317
+ value: 54.02
318
+ - task:
319
+ type: Natural language inference
320
+ dataset:
321
+ type: xnli
322
+ name: XNLI
323
+ config: sw
324
+ split: validation
325
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
326
+ metrics:
327
+ - type: Accuracy
328
+ value: 52.09
329
+ - task:
330
+ type: Natural language inference
331
+ dataset:
332
+ type: xnli
333
+ name: XNLI
334
+ config: th
335
+ split: validation
336
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
337
+ metrics:
338
+ - type: Accuracy
339
+ value: 43.78
340
+ - task:
341
+ type: Natural language inference
342
+ dataset:
343
+ type: xnli
344
+ name: XNLI
345
+ config: tr
346
+ split: validation
347
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
348
+ metrics:
349
+ - type: Accuracy
350
+ value: 45.7
351
+ - task:
352
+ type: Natural language inference
353
+ dataset:
354
+ type: xnli
355
+ name: XNLI
356
+ config: ur
357
+ split: validation
358
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
359
+ metrics:
360
+ - type: Accuracy
361
+ value: 50.8
362
+ - task:
363
+ type: Natural language inference
364
+ dataset:
365
+ type: xnli
366
+ name: XNLI
367
+ config: vi
368
+ split: validation
369
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
370
+ metrics:
371
+ - type: Accuracy
372
+ value: 61.0
373
+ - task:
374
+ type: Natural language inference
375
+ dataset:
376
+ type: xnli
377
+ name: XNLI
378
+ config: zh
379
+ split: validation
380
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
381
+ metrics:
382
+ - type: Accuracy
383
+ value: 56.91
384
+ - task:
385
+ type: Program synthesis
386
+ dataset:
387
+ type: openai_humaneval
388
+ name: HumanEval
389
+ split: test
390
+ revision: e8dc562f5de170c54b5481011dd9f4fa04845771
391
+ metrics:
392
+ - type: Pass@1
393
+ value: 12.06
394
+ - type: Pass@10
395
+ value: 26.53
396
+ - type: Pass@100
397
+ value: 48.44
398
+ - task:
399
+ type: Sentence completion
400
+ dataset:
401
+ type: story_cloze
402
+ name: StoryCloze
403
+ config: "2016"
404
+ split: validation
405
+ revision: e724c6f8cdf7c7a2fb229d862226e15b023ee4db
406
+ metrics:
407
+ - type: Accuracy
408
+ value: 96.26
409
+ - task:
410
+ type: Sentence completion
411
+ dataset:
412
+ type: super_glue
413
+ name: SuperGLUE
414
+ config: copa
415
+ split: validation
416
+ revision: 9e12063561e7e6c79099feb6d5a493142584e9e2
417
+ metrics:
418
+ - type: Accuracy
419
+ value: 91.0
420
+ - task:
421
+ type: Sentence completion
422
+ dataset:
423
+ type: xcopa
424
+ name: XCOPA
425
+ config: et
426
+ split: validation
427
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
428
+ metrics:
429
+ - type: Accuracy
430
+ value: 51.0
431
+ - task:
432
+ type: Sentence completion
433
+ dataset:
434
+ type: xcopa
435
+ name: XCOPA
436
+ config: ht
437
+ split: validation
438
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
439
+ metrics:
440
+ - type: Accuracy
441
+ value: 58.0
442
+ - task:
443
+ type: Sentence completion
444
+ dataset:
445
+ type: xcopa
446
+ name: XCOPA
447
+ config: id
448
+ split: validation
449
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
450
+ metrics:
451
+ - type: Accuracy
452
+ value: 86.0
453
+ - task:
454
+ type: Sentence completion
455
+ dataset:
456
+ type: xcopa
457
+ name: XCOPA
458
+ config: it
459
+ split: validation
460
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
461
+ metrics:
462
+ - type: Accuracy
463
+ value: 74.0
464
+ - task:
465
+ type: Sentence completion
466
+ dataset:
467
+ type: xcopa
468
+ name: XCOPA
469
+ config: qu
470
+ split: validation
471
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
472
+ metrics:
473
+ - type: Accuracy
474
+ value: 56.0
475
+ - task:
476
+ type: Sentence completion
477
+ dataset:
478
+ type: xcopa
479
+ name: XCOPA
480
+ config: sw
481
+ split: validation
482
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
483
+ metrics:
484
+ - type: Accuracy
485
+ value: 64.0
486
+ - task:
487
+ type: Sentence completion
488
+ dataset:
489
+ type: xcopa
490
+ name: XCOPA
491
+ config: ta
492
+ split: validation
493
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
494
+ metrics:
495
+ - type: Accuracy
496
+ value: 69.0
497
+ - task:
498
+ type: Sentence completion
499
+ dataset:
500
+ type: xcopa
501
+ name: XCOPA
502
+ config: th
503
+ split: validation
504
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
505
+ metrics:
506
+ - type: Accuracy
507
+ value: 58.0
508
+ - task:
509
+ type: Sentence completion
510
+ dataset:
511
+ type: xcopa
512
+ name: XCOPA
513
+ config: tr
514
+ split: validation
515
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
516
+ metrics:
517
+ - type: Accuracy
518
+ value: 57.0
519
+ - task:
520
+ type: Sentence completion
521
+ dataset:
522
+ type: xcopa
523
+ name: XCOPA
524
+ config: vi
525
+ split: validation
526
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
527
+ metrics:
528
+ - type: Accuracy
529
+ value: 87.0
530
+ - task:
531
+ type: Sentence completion
532
+ dataset:
533
+ type: xcopa
534
+ name: XCOPA
535
+ config: zh
536
+ split: validation
537
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
538
+ metrics:
539
+ - type: Accuracy
540
+ value: 90.0
541
+ - task:
542
+ type: Sentence completion
543
+ dataset:
544
+ type: Muennighoff/xstory_cloze
545
+ name: XStoryCloze
546
+ config: ar
547
+ split: validation
548
+ revision: 8bb76e594b68147f1a430e86829d07189622b90d
549
+ metrics:
550
+ - type: Accuracy
551
+ value: 92.79
552
+ - task:
553
+ type: Sentence completion
554
+ dataset:
555
+ type: Muennighoff/xstory_cloze
556
+ name: XStoryCloze
557
+ config: es
558
+ split: validation
559
+ revision: 8bb76e594b68147f1a430e86829d07189622b90d
560
+ metrics:
561
+ - type: Accuracy
562
+ value: 94.37
563
+ - task:
564
+ type: Sentence completion
565
+ dataset:
566
+ type: Muennighoff/xstory_cloze
567
+ name: XStoryCloze
568
+ config: eu
569
+ split: validation
570
+ revision: 8bb76e594b68147f1a430e86829d07189622b90d
571
+ metrics:
572
+ - type: Accuracy
573
+ value: 86.9
574
+ - task:
575
+ type: Sentence completion
576
+ dataset:
577
+ type: Muennighoff/xstory_cloze
578
+ name: XStoryCloze
579
+ config: hi
580
+ split: validation
581
+ revision: 8bb76e594b68147f1a430e86829d07189622b90d
582
+ metrics:
583
+ - type: Accuracy
584
+ value: 88.42
585
+ - task:
586
+ type: Sentence completion
587
+ dataset:
588
+ type: Muennighoff/xstory_cloze
589
+ name: XStoryCloze
590
+ config: id
591
+ split: validation
592
+ revision: 8bb76e594b68147f1a430e86829d07189622b90d
593
+ metrics:
594
+ - type: Accuracy
595
+ value: 92.12
596
+ - task:
597
+ type: Sentence completion
598
+ dataset:
599
+ type: Muennighoff/xstory_cloze
600
+ name: XStoryCloze
601
+ config: my
602
+ split: validation
603
+ revision: 8bb76e594b68147f1a430e86829d07189622b90d
604
+ metrics:
605
+ - type: Accuracy
606
+ value: 52.35
607
+ - task:
608
+ type: Sentence completion
609
+ dataset:
610
+ type: Muennighoff/xstory_cloze
611
+ name: XStoryCloze
612
+ config: ru
613
+ split: validation
614
+ revision: 8bb76e594b68147f1a430e86829d07189622b90d
615
+ metrics:
616
+ - type: Accuracy
617
+ value: 81.73
618
+ - task:
619
+ type: Sentence completion
620
+ dataset:
621
+ type: Muennighoff/xstory_cloze
622
+ name: XStoryCloze
623
+ config: sw
624
+ split: validation
625
+ revision: 8bb76e594b68147f1a430e86829d07189622b90d
626
+ metrics:
627
+ - type: Accuracy
628
+ value: 79.81
629
+ - task:
630
+ type: Sentence completion
631
+ dataset:
632
+ type: Muennighoff/xstory_cloze
633
+ name: XStoryCloze
634
+ config: te
635
+ split: validation
636
+ revision: 8bb76e594b68147f1a430e86829d07189622b90d
637
+ metrics:
638
+ - type: Accuracy
639
+ value: 81.2
640
+ - task:
641
+ type: Sentence completion
642
+ dataset:
643
+ type: Muennighoff/xstory_cloze
644
+ name: XStoryCloze
645
+ config: zh
646
+ split: validation
647
+ revision: 8bb76e594b68147f1a430e86829d07189622b90d
648
+ metrics:
649
+ - type: Accuracy
650
+ value: 93.12
651
  ---
652
 
653
  ![xmtf](https://github.com/bigscience-workshop/xmtf/blob/master/xmtf_banner.png?raw=true)