Muennighoff commited on
Commit
c900c5a
·
1 Parent(s): 68fd724

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +880 -0
README.md ADDED
@@ -0,0 +1,880 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - bigscience/xP3mt
4
+ license: bigscience-bloom-rail-1.0
5
+ language:
6
+ - ak
7
+ - ar
8
+ - as
9
+ - bm
10
+ - bn
11
+ - ca
12
+ - code
13
+ - en
14
+ - es
15
+ - eu
16
+ - fon
17
+ - fr
18
+ - gu
19
+ - hi
20
+ - id
21
+ - ig
22
+ - ki
23
+ - kn
24
+ - lg
25
+ - ln
26
+ - ml
27
+ - mr
28
+ - ne
29
+ - nso
30
+ - ny
31
+ - or
32
+ - pa
33
+ - pt
34
+ - rn
35
+ - rw
36
+ - sn
37
+ - st
38
+ - sw
39
+ - ta
40
+ - te
41
+ - tn
42
+ - ts
43
+ - tum
44
+ - tw
45
+ - ur
46
+ - vi
47
+ - wo
48
+ - xh
49
+ - yo
50
+ - zh
51
+ - zu
52
+ programming_language:
53
+ - C
54
+ - C++
55
+ - C#
56
+ - Go
57
+ - Java
58
+ - JavaScript
59
+ - Lua
60
+ - PHP
61
+ - Python
62
+ - Ruby
63
+ - Rust
64
+ - Scala
65
+ - TypeScript
66
+ pipeline_tag: text-generation
67
+ widget:
68
+ - text: "一个传奇的开端,一个不灭的神话,这不仅仅是一部电影,而是作为一个走进新时代的标签,永远彪炳史册。Would you rate the previous review as positive, neutral or negative?"
69
+ example_title: "zh-en sentiment"
70
+ - text: "一个传奇的开端,一个不灭的神话,这不仅仅是一部电影,而是作为一个走进新时代的标签,永远彪炳史册。你认为这句话的立场是赞扬、中立还是批评?"
71
+ example_title: "zh-zh sentiment"
72
+ - text: "Suggest at least five related search terms to \"Mạng neural nhân tạo\"."
73
+ example_title: "vi-en query"
74
+ - text: "Proposez au moins cinq mots clés concernant «Réseau de neurones artificiels»."
75
+ example_title: "fr-fr query"
76
+ - text: "Explain in a sentence in Telugu what is backpropagation in neural networks."
77
+ example_title: "te-en qa"
78
+ - text: "Why is the sky blue?"
79
+ example_title: "en-en qa"
80
+ - text: "Write a fairy tale about a troll saving a princess from a dangerous dragon. The fairy tale is a masterpiece that has achieved praise worldwide and its moral is \"Heroes Come in All Shapes and Sizes\". Story (in Spanish):"
81
+ example_title: "es-en fable"
82
+ - text: "Write a fable about wood elves living in a forest that is suddenly invaded by ogres. The fable is a masterpiece that has achieved praise worldwide and its moral is \"Violence is the last refuge of the incompetent\". Fable (in Hindi):"
83
+ example_title: "hi-en fable"
84
+ model-index:
85
+ - name: bloomz-7b1-mt
86
+ results:
87
+ - task:
88
+ type: Coreference resolution
89
+ dataset:
90
+ type: winogrande
91
+ name: Winogrande XL (xl)
92
+ config: xl
93
+ split: validation
94
+ revision: a80f460359d1e9a67c006011c94de42a8759430c
95
+ metrics:
96
+ - type: Accuracy
97
+ value: 56.51
98
+ - task:
99
+ type: Coreference resolution
100
+ dataset:
101
+ type: Muennighoff/xwinograd
102
+ name: XWinograd (en)
103
+ config: en
104
+ split: test
105
+ revision: 9dd5ea5505fad86b7bedad667955577815300cee
106
+ metrics:
107
+ - type: Accuracy
108
+ value: 65.76
109
+ - task:
110
+ type: Coreference resolution
111
+ dataset:
112
+ type: Muennighoff/xwinograd
113
+ name: XWinograd (fr)
114
+ config: fr
115
+ split: test
116
+ revision: 9dd5ea5505fad86b7bedad667955577815300cee
117
+ metrics:
118
+ - type: Accuracy
119
+ value: 57.83
120
+ - task:
121
+ type: Coreference resolution
122
+ dataset:
123
+ type: Muennighoff/xwinograd
124
+ name: XWinograd (jp)
125
+ config: jp
126
+ split: test
127
+ revision: 9dd5ea5505fad86b7bedad667955577815300cee
128
+ metrics:
129
+ - type: Accuracy
130
+ value: 51.82
131
+ - task:
132
+ type: Coreference resolution
133
+ dataset:
134
+ type: Muennighoff/xwinograd
135
+ name: XWinograd (pt)
136
+ config: pt
137
+ split: test
138
+ revision: 9dd5ea5505fad86b7bedad667955577815300cee
139
+ metrics:
140
+ - type: Accuracy
141
+ value: 57.41
142
+ - task:
143
+ type: Coreference resolution
144
+ dataset:
145
+ type: Muennighoff/xwinograd
146
+ name: XWinograd (ru)
147
+ config: ru
148
+ split: test
149
+ revision: 9dd5ea5505fad86b7bedad667955577815300cee
150
+ metrics:
151
+ - type: Accuracy
152
+ value: 55.87
153
+ - task:
154
+ type: Coreference resolution
155
+ dataset:
156
+ type: Muennighoff/xwinograd
157
+ name: XWinograd (zh)
158
+ config: zh
159
+ split: test
160
+ revision: 9dd5ea5505fad86b7bedad667955577815300cee
161
+ metrics:
162
+ - type: Accuracy
163
+ value: 62.7
164
+ - task:
165
+ type: Natural language inference
166
+ dataset:
167
+ type: anli
168
+ name: ANLI (r1)
169
+ config: r1
170
+ split: validation
171
+ revision: 9dbd830a06fea8b1c49d6e5ef2004a08d9f45094
172
+ metrics:
173
+ - type: Accuracy
174
+ value: 42.6
175
+ - task:
176
+ type: Natural language inference
177
+ dataset:
178
+ type: anli
179
+ name: ANLI (r2)
180
+ config: r2
181
+ split: validation
182
+ revision: 9dbd830a06fea8b1c49d6e5ef2004a08d9f45094
183
+ metrics:
184
+ - type: Accuracy
185
+ value: 39.4
186
+ - task:
187
+ type: Natural language inference
188
+ dataset:
189
+ type: anli
190
+ name: ANLI (r3)
191
+ config: r3
192
+ split: validation
193
+ revision: 9dbd830a06fea8b1c49d6e5ef2004a08d9f45094
194
+ metrics:
195
+ - type: Accuracy
196
+ value: 42.0
197
+ - task:
198
+ type: Natural language inference
199
+ dataset:
200
+ type: super_glue
201
+ name: SuperGLUE (cb)
202
+ config: cb
203
+ split: validation
204
+ revision: 9e12063561e7e6c79099feb6d5a493142584e9e2
205
+ metrics:
206
+ - type: Accuracy
207
+ value: 83.93
208
+ - task:
209
+ type: Natural language inference
210
+ dataset:
211
+ type: super_glue
212
+ name: SuperGLUE (rte)
213
+ config: rte
214
+ split: validation
215
+ revision: 9e12063561e7e6c79099feb6d5a493142584e9e2
216
+ metrics:
217
+ - type: Accuracy
218
+ value: 82.67
219
+ - task:
220
+ type: Natural language inference
221
+ dataset:
222
+ type: xnli
223
+ name: XNLI (ar)
224
+ config: ar
225
+ split: validation
226
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
227
+ metrics:
228
+ - type: Accuracy
229
+ value: 55.58
230
+ - task:
231
+ type: Natural language inference
232
+ dataset:
233
+ type: xnli
234
+ name: XNLI (bg)
235
+ config: bg
236
+ split: validation
237
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
238
+ metrics:
239
+ - type: Accuracy
240
+ value: 44.9
241
+ - task:
242
+ type: Natural language inference
243
+ dataset:
244
+ type: xnli
245
+ name: XNLI (de)
246
+ config: de
247
+ split: validation
248
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
249
+ metrics:
250
+ - type: Accuracy
251
+ value: 48.92
252
+ - task:
253
+ type: Natural language inference
254
+ dataset:
255
+ type: xnli
256
+ name: XNLI (el)
257
+ config: el
258
+ split: validation
259
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
260
+ metrics:
261
+ - type: Accuracy
262
+ value: 42.89
263
+ - task:
264
+ type: Natural language inference
265
+ dataset:
266
+ type: xnli
267
+ name: XNLI (en)
268
+ config: en
269
+ split: validation
270
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
271
+ metrics:
272
+ - type: Accuracy
273
+ value: 58.92
274
+ - task:
275
+ type: Natural language inference
276
+ dataset:
277
+ type: xnli
278
+ name: XNLI (es)
279
+ config: es
280
+ split: validation
281
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
282
+ metrics:
283
+ - type: Accuracy
284
+ value: 57.35
285
+ - task:
286
+ type: Natural language inference
287
+ dataset:
288
+ type: xnli
289
+ name: XNLI (fr)
290
+ config: fr
291
+ split: validation
292
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
293
+ metrics:
294
+ - type: Accuracy
295
+ value: 56.67
296
+ - task:
297
+ type: Natural language inference
298
+ dataset:
299
+ type: xnli
300
+ name: XNLI (hi)
301
+ config: hi
302
+ split: validation
303
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
304
+ metrics:
305
+ - type: Accuracy
306
+ value: 53.45
307
+ - task:
308
+ type: Natural language inference
309
+ dataset:
310
+ type: xnli
311
+ name: XNLI (ru)
312
+ config: ru
313
+ split: validation
314
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
315
+ metrics:
316
+ - type: Accuracy
317
+ value: 50.24
318
+ - task:
319
+ type: Natural language inference
320
+ dataset:
321
+ type: xnli
322
+ name: XNLI (sw)
323
+ config: sw
324
+ split: validation
325
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
326
+ metrics:
327
+ - type: Accuracy
328
+ value: 48.27
329
+ - task:
330
+ type: Natural language inference
331
+ dataset:
332
+ type: xnli
333
+ name: XNLI (th)
334
+ config: th
335
+ split: validation
336
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
337
+ metrics:
338
+ - type: Accuracy
339
+ value: 41.08
340
+ - task:
341
+ type: Natural language inference
342
+ dataset:
343
+ type: xnli
344
+ name: XNLI (tr)
345
+ config: tr
346
+ split: validation
347
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
348
+ metrics:
349
+ - type: Accuracy
350
+ value: 38.71
351
+ - task:
352
+ type: Natural language inference
353
+ dataset:
354
+ type: xnli
355
+ name: XNLI (ur)
356
+ config: ur
357
+ split: validation
358
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
359
+ metrics:
360
+ - type: Accuracy
361
+ value: 49.48
362
+ - task:
363
+ type: Natural language inference
364
+ dataset:
365
+ type: xnli
366
+ name: XNLI (vi)
367
+ config: vi
368
+ split: validation
369
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
370
+ metrics:
371
+ - type: Accuracy
372
+ value: 54.5
373
+ - task:
374
+ type: Natural language inference
375
+ dataset:
376
+ type: xnli
377
+ name: XNLI (zh)
378
+ config: zh
379
+ split: validation
380
+ revision: a5a45e4ff92d5d3f34de70aaf4b72c3bdf9f7f16
381
+ metrics:
382
+ - type: Accuracy
383
+ value: 54.3
384
+ - task:
385
+ type: Program synthesis
386
+ dataset:
387
+ type: openai_humaneval
388
+ name: HumanEval
389
+ config: None
390
+ split: test
391
+ revision: e8dc562f5de170c54b5481011dd9f4fa04845771
392
+ metrics:
393
+ - type: Pass@1
394
+ value: 7.23
395
+ - type: Pass@10
396
+ value: 14.46
397
+ - type: Pass@100
398
+ value: 25.86
399
+ - task:
400
+ type: Sentence completion
401
+ dataset:
402
+ type: story_cloze
403
+ name: StoryCloze (2016)
404
+ config: "2016"
405
+ split: validation
406
+ revision: e724c6f8cdf7c7a2fb229d862226e15b023ee4db
407
+ metrics:
408
+ - type: Accuracy
409
+ value: 89.58
410
+ - task:
411
+ type: Sentence completion
412
+ dataset:
413
+ type: super_glue
414
+ name: SuperGLUE (copa)
415
+ config: copa
416
+ split: validation
417
+ revision: 9e12063561e7e6c79099feb6d5a493142584e9e2
418
+ metrics:
419
+ - type: Accuracy
420
+ value: 84.0
421
+ - task:
422
+ type: Sentence completion
423
+ dataset:
424
+ type: xcopa
425
+ name: XCOPA (et)
426
+ config: et
427
+ split: validation
428
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
429
+ metrics:
430
+ - type: Accuracy
431
+ value: 52.0
432
+ - task:
433
+ type: Sentence completion
434
+ dataset:
435
+ type: xcopa
436
+ name: XCOPA (ht)
437
+ config: ht
438
+ split: validation
439
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
440
+ metrics:
441
+ - type: Accuracy
442
+ value: 54.0
443
+ - task:
444
+ type: Sentence completion
445
+ dataset:
446
+ type: xcopa
447
+ name: XCOPA (id)
448
+ config: id
449
+ split: validation
450
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
451
+ metrics:
452
+ - type: Accuracy
453
+ value: 73.0
454
+ - task:
455
+ type: Sentence completion
456
+ dataset:
457
+ type: xcopa
458
+ name: XCOPA (it)
459
+ config: it
460
+ split: validation
461
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
462
+ metrics:
463
+ - type: Accuracy
464
+ value: 62.0
465
+ - task:
466
+ type: Sentence completion
467
+ dataset:
468
+ type: xcopa
469
+ name: XCOPA (qu)
470
+ config: qu
471
+ split: validation
472
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
473
+ metrics:
474
+ - type: Accuracy
475
+ value: 61.0
476
+ - task:
477
+ type: Sentence completion
478
+ dataset:
479
+ type: xcopa
480
+ name: XCOPA (sw)
481
+ config: sw
482
+ split: validation
483
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
484
+ metrics:
485
+ - type: Accuracy
486
+ value: 61.0
487
+ - task:
488
+ type: Sentence completion
489
+ dataset:
490
+ type: xcopa
491
+ name: XCOPA (ta)
492
+ config: ta
493
+ split: validation
494
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
495
+ metrics:
496
+ - type: Accuracy
497
+ value: 62.0
498
+ - task:
499
+ type: Sentence completion
500
+ dataset:
501
+ type: xcopa
502
+ name: XCOPA (th)
503
+ config: th
504
+ split: validation
505
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
506
+ metrics:
507
+ - type: Accuracy
508
+ value: 61.0
509
+ - task:
510
+ type: Sentence completion
511
+ dataset:
512
+ type: xcopa
513
+ name: XCOPA (tr)
514
+ config: tr
515
+ split: validation
516
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
517
+ metrics:
518
+ - type: Accuracy
519
+ value: 56.0
520
+ - task:
521
+ type: Sentence completion
522
+ dataset:
523
+ type: xcopa
524
+ name: XCOPA (vi)
525
+ config: vi
526
+ split: validation
527
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
528
+ metrics:
529
+ - type: Accuracy
530
+ value: 77.0
531
+ - task:
532
+ type: Sentence completion
533
+ dataset:
534
+ type: xcopa
535
+ name: XCOPA (zh)
536
+ config: zh
537
+ split: validation
538
+ revision: 37f73c60fb123111fa5af5f9b705d0b3747fd187
539
+ metrics:
540
+ - type: Accuracy
541
+ value: 80.0
542
+ - task:
543
+ type: Sentence completion
544
+ dataset:
545
+ type: Muennighoff/xstory_cloze
546
+ name: XStoryCloze (ar)
547
+ config: ar
548
+ split: validation
549
+ revision: 8bb76e594b68147f1a430e86829d07189622b90d
550
+ metrics:
551
+ - type: Accuracy
552
+ value: 83.85
553
+ - task:
554
+ type: Sentence completion
555
+ dataset:
556
+ type: Muennighoff/xstory_cloze
557
+ name: XStoryCloze (es)
558
+ config: es
559
+ split: validation
560
+ revision: 8bb76e594b68147f1a430e86829d07189622b90d
561
+ metrics:
562
+ - type: Accuracy
563
+ value: 88.82
564
+ - task:
565
+ type: Sentence completion
566
+ dataset:
567
+ type: Muennighoff/xstory_cloze
568
+ name: XStoryCloze (eu)
569
+ config: eu
570
+ split: validation
571
+ revision: 8bb76e594b68147f1a430e86829d07189622b90d
572
+ metrics:
573
+ - type: Accuracy
574
+ value: 73.26
575
+ - task:
576
+ type: Sentence completion
577
+ dataset:
578
+ type: Muennighoff/xstory_cloze
579
+ name: XStoryCloze (hi)
580
+ config: hi
581
+ split: validation
582
+ revision: 8bb76e594b68147f1a430e86829d07189622b90d
583
+ metrics:
584
+ - type: Accuracy
585
+ value: 80.41
586
+ - task:
587
+ type: Sentence completion
588
+ dataset:
589
+ type: Muennighoff/xstory_cloze
590
+ name: XStoryCloze (id)
591
+ config: id
592
+ split: validation
593
+ revision: 8bb76e594b68147f1a430e86829d07189622b90d
594
+ metrics:
595
+ - type: Accuracy
596
+ value: 84.58
597
+ - task:
598
+ type: Sentence completion
599
+ dataset:
600
+ type: Muennighoff/xstory_cloze
601
+ name: XStoryCloze (my)
602
+ config: my
603
+ split: validation
604
+ revision: 8bb76e594b68147f1a430e86829d07189622b90d
605
+ metrics:
606
+ - type: Accuracy
607
+ value: 51.56
608
+ - task:
609
+ type: Sentence completion
610
+ dataset:
611
+ type: Muennighoff/xstory_cloze
612
+ name: XStoryCloze (ru)
613
+ config: ru
614
+ split: validation
615
+ revision: 8bb76e594b68147f1a430e86829d07189622b90d
616
+ metrics:
617
+ - type: Accuracy
618
+ value: 64.26
619
+ - task:
620
+ type: Sentence completion
621
+ dataset:
622
+ type: Muennighoff/xstory_cloze
623
+ name: XStoryCloze (sw)
624
+ config: sw
625
+ split: validation
626
+ revision: 8bb76e594b68147f1a430e86829d07189622b90d
627
+ metrics:
628
+ - type: Accuracy
629
+ value: 71.01
630
+ - task:
631
+ type: Sentence completion
632
+ dataset:
633
+ type: Muennighoff/xstory_cloze
634
+ name: XStoryCloze (te)
635
+ config: te
636
+ split: validation
637
+ revision: 8bb76e594b68147f1a430e86829d07189622b90d
638
+ metrics:
639
+ - type: Accuracy
640
+ value: 73.06
641
+ - task:
642
+ type: Sentence completion
643
+ dataset:
644
+ type: Muennighoff/xstory_cloze
645
+ name: XStoryCloze (zh)
646
+ config: zh
647
+ split: validation
648
+ revision: 8bb76e594b68147f1a430e86829d07189622b90d
649
+ metrics:
650
+ - type: Accuracy
651
+ value: 85.9
652
+ ---
653
+
654
+ ![xmtf](https://github.com/bigscience-workshop/xmtf/blob/master/xmtf_banner.png?raw=true)
655
+
656
+ # Table of Contents
657
+
658
+ 1. [Model Summary](#model-summary)
659
+ 2. [Use](#use)
660
+ 3. [Limitations](#limitations)
661
+ 4. [Training](#training)
662
+ 5. [Evaluation](#evaluation)
663
+ 7. [Citation](#citation)
664
+
665
+ # Model Summary
666
+
667
+ > We present BLOOMZ & mT0, a family of models capable of following human instructions in dozens of languages zero-shot. We finetune BLOOM & mT5 pretrained multilingual language models on our crosslingual task mixture (xP3) and find the resulting models capable of crosslingual generalization to unseen tasks & languages.
668
+
669
+ - **Repository:** [bigscience-workshop/xmtf](https://github.com/bigscience-workshop/xmtf)
670
+ - **Paper:** [Crosslingual Generalization through Multitask Finetuning](https://arxiv.org/abs/2211.01786)
671
+ - **Point of Contact:** [Niklas Muennighoff](mailto:niklas@hf.co)
672
+ - **Languages:** Refer to [bloom](https://huggingface.co/bigscience/bloom) for pretraining & [xP3](https://huggingface.co/datasets/bigscience/xP3) for finetuning language proportions. It understands both pretraining & finetuning languages.
673
+ - **BLOOMZ & mT0 Model Family:**
674
+
675
+ <table>
676
+ <tr>
677
+ <th colspan="12">Multitask finetuned on <a style="font-weight:bold" href=https://huggingface.co/datasets/bigscience/xP3>xP3</a>. Recommended for prompting in English.
678
+ </tr>
679
+ <tr>
680
+ <td>Parameters</td>
681
+ <td>300M</td>
682
+ <td>580M</td>
683
+ <td>1.2B</td>
684
+ <td>3.7B</td>
685
+ <td>13B</td>
686
+ <td>560M</td>
687
+ <td>1.1B</td>
688
+ <td>1.7B</td>
689
+ <td>3B</td>
690
+ <td>7.1B</td>
691
+ <td>176B</td>
692
+ </tr>
693
+ <tr>
694
+ <td>Finetuned Model</td>
695
+ <td><a href=https://huggingface.co/bigscience/mt0-base>mt0-base</a></td>
696
+ <td><a href=https://huggingface.co/bigscience/mt0-small>mt0-small</a></td>
697
+ <td><a href=https://huggingface.co/bigscience/mt0-large>mt0-large</a></td>
698
+ <td><a href=https://huggingface.co/bigscience/mt0-xl>mt0-xl</a></td>
699
+ <td><a href=https://huggingface.co/bigscience/mt0-xxl>mt0-xxl</a></td>
700
+ <td><a href=https://huggingface.co/bigscience/bloomz-560m>bloomz-560m</a></td>
701
+ <td><a href=https://huggingface.co/bigscience/bloomz-1b1>bloomz-1b1</a></td>
702
+ <td><a href=https://huggingface.co/bigscience/bloomz-1b7>bloomz-1b7</a></td>
703
+ <td><a href=https://huggingface.co/bigscience/bloomz-3b>bloomz-3b</a></td>
704
+ <td><a href=https://huggingface.co/bigscience/bloomz-7b1>bloomz-7b1</a></td>
705
+ <td><a href=https://huggingface.co/bigscience/bloomz>bloomz</a></td>
706
+ </tr>
707
+ </tr>
708
+ <tr>
709
+ <th colspan="12">Multitask finetuned on <a style="font-weight:bold" href=https://huggingface.co/datasets/bigscience/xP3mt>xP3mt</a>. Recommended for prompting in non-English.</th>
710
+ </tr>
711
+ <tr>
712
+ <td>Finetuned Model</td>
713
+ <td></td>
714
+ <td></td>
715
+ <td></td>
716
+ <td></td>
717
+ <td><a href=https://huggingface.co/bigscience/mt0-xxl-mt>mt0-xxl-mt</a></td>
718
+ <td></td>
719
+ <td></td>
720
+ <td></td>
721
+ <td></td>
722
+ <td><a href=https://huggingface.co/bigscience/bloomz-7b1-mt>bloomz-7b1-mt</a></td>
723
+ <td><a href=https://huggingface.co/bigscience/bloomz-mt>bloomz-mt</a></td>
724
+ </tr>
725
+ <th colspan="12">Multitask finetuned on <a style="font-weight:bold" href=https://huggingface.co/datasets/Muennighoff/P3>P3</a>. Released for research purposes only. Strictly inferior to above models!</th>
726
+ </tr>
727
+ <tr>
728
+ <td>Finetuned Model</td>
729
+ <td></td>
730
+ <td></td>
731
+ <td></td>
732
+ <td></td>
733
+ <td><a href=https://huggingface.co/bigscience/mt0-xxl-p3>mt0-xxl-p3</a></td>
734
+ <td></td>
735
+ <td></td>
736
+ <td></td>
737
+ <td></td>
738
+ <td><a href=https://huggingface.co/bigscience/bloomz-7b1-p3>bloomz-7b1-p3</a></td>
739
+ <td><a href=https://huggingface.co/bigscience/bloomz-p3>bloomz-p3</a></td>
740
+ </tr>
741
+ <th colspan="12">Original pretrained checkpoints. Not recommended.</th>
742
+ <tr>
743
+ <td>Pretrained Model</td>
744
+ <td><a href=https://huggingface.co/google/mt5-base>mt5-base</a></td>
745
+ <td><a href=https://huggingface.co/google/mt5-small>mt5-small</a></td>
746
+ <td><a href=https://huggingface.co/google/mt5-large>mt5-large</a></td>
747
+ <td><a href=https://huggingface.co/google/mt5-xl>mt5-xl</a></td>
748
+ <td><a href=https://huggingface.co/google/mt5-xxl>mt5-xxl</a></td>
749
+ <td><a href=https://huggingface.co/bigscience/bloom-560m>bloom-560m</a></td>
750
+ <td><a href=https://huggingface.co/bigscience/bloom-1b1>bloom-1b1</a></td>
751
+ <td><a href=https://huggingface.co/bigscience/bloom-1b7>bloom-1b7</a></td>
752
+ <td><a href=https://huggingface.co/bigscience/bloom-3b>bloom-3b</a></td>
753
+ <td><a href=https://huggingface.co/bigscience/bloom-7b1>bloom-7b1</a></td>
754
+ <td><a href=https://huggingface.co/bigscience/bloom>bloom</a></td>
755
+ </tr>
756
+ </table>
757
+
758
+
759
+ # Use
760
+
761
+ ## Intended use
762
+
763
+ We recommend using the model to perform tasks expressed in natural language. For example, given the prompt "*Translate to English: Je t’aime.*", the model will most likely answer "*I love you.*". Some prompt ideas from our paper:
764
+ - 一个传奇的开端,一个不灭的神话,这不仅仅是一部电影,而是作为一个走进新时代的标签,永远彪炳史册。你认为这句话的立场是赞扬、中立还是批评?
765
+ - Suggest at least five related search terms to "Mạng neural nhân tạo".
766
+ - Write a fairy tale about a troll saving a princess from a dangerous dragon. The fairy tale is a masterpiece that has achieved praise worldwide and its moral is "Heroes Come in All Shapes and Sizes". Story (in Spanish):
767
+ - Explain in a sentence in Telugu what is backpropagation in neural networks.
768
+
769
+ **Feel free to share your generations in the Community tab!**
770
+
771
+ ## How to use
772
+
773
+ ### CPU
774
+
775
+ <details>
776
+ <summary> Click to expand </summary>
777
+
778
+ ```python
779
+ # pip install -q transformers
780
+ from transformers import AutoModelForCausalLM, AutoTokenizer
781
+
782
+ checkpoint = "bigscience/bloomz-7b1-mt"
783
+
784
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
785
+ model = AutoModelForCausalLM.from_pretrained(checkpoint)
786
+
787
+ inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt")
788
+ outputs = model.generate(inputs)
789
+ print(tokenizer.decode(outputs[0]))
790
+ ```
791
+
792
+ </details>
793
+
794
+ ### GPU
795
+
796
+ <details>
797
+ <summary> Click to expand </summary>
798
+
799
+ ```python
800
+ # pip install -q transformers accelerate
801
+ from transformers import AutoModelForCausalLM, AutoTokenizer
802
+
803
+ checkpoint = "bigscience/bloomz-7b1-mt"
804
+
805
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
806
+ model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype="auto", device_map="auto")
807
+
808
+ inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda")
809
+ outputs = model.generate(inputs)
810
+ print(tokenizer.decode(outputs[0]))
811
+ ```
812
+
813
+ </details>
814
+
815
+ ### GPU in 8bit
816
+
817
+ <details>
818
+ <summary> Click to expand </summary>
819
+
820
+ ```python
821
+ # pip install -q transformers accelerate bitsandbytes
822
+ from transformers import AutoModelForCausalLM, AutoTokenizer
823
+
824
+ checkpoint = "bigscience/bloomz-7b1-mt"
825
+
826
+ tokenizer = AutoTokenizer.from_pretrained(checkpoint)
827
+ model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", load_in_8bit=True)
828
+
829
+ inputs = tokenizer.encode("Translate to English: Je t’aime.", return_tensors="pt").to("cuda")
830
+ outputs = model.generate(inputs)
831
+ print(tokenizer.decode(outputs[0]))
832
+ ```
833
+
834
+ </details>
835
+
836
+ <!-- Necessary for whitespace -->
837
+ ###
838
+
839
+ # Limitations
840
+
841
+ **Prompt Engineering:** The performance may vary depending on the prompt. For BLOOMZ models, we recommend making it very clear when the input stops to avoid the model trying to continue it. For example, the prompt "*Translate to English: Je t'aime*" without the full stop (.) at the end, may result in the model trying to continue the French sentence. Better prompts are e.g. "*Translate to English: Je t'aime.*", "*Translate to English: Je t'aime. Translation:*" "*What is "Je t'aime." in English?*", where it is clear for the model when it should answer. Further, we recommend providing the model as much context as possible. For example, if you want it to answer in Telugu, then tell the model, e.g. "*Explain in a sentence in Telugu what is backpropagation in neural networks.*".
842
+
843
+ # Training
844
+
845
+ ## Model
846
+
847
+ - **Architecture:** Same as [bloom-7b1](https://huggingface.co/bigscience/bloom-7b1), also refer to the `config.json` file
848
+ - **Finetuning steps:** 1000
849
+ - **Finetuning tokens:** 4.19 billion
850
+ - **Finetuning layout:** 1x pipeline parallel, 1x tensor parallel, 64x data parallel
851
+ - **Precision:** float16
852
+
853
+ ## Hardware
854
+
855
+ - **CPUs:** AMD CPUs with 512GB memory per node
856
+ - **GPUs:** 64 A100 80GB GPUs with 8 GPUs per node (8 nodes) using NVLink 4 inter-gpu connects, 4 OmniPath links
857
+ - **Communication:** NCCL-communications network with a fully dedicated subnet
858
+
859
+ ## Software
860
+
861
+ - **Orchestration:** [Megatron-DeepSpeed](https://github.com/bigscience-workshop/Megatron-DeepSpeed)
862
+ - **Optimizer & parallelism:** [DeepSpeed](https://github.com/microsoft/DeepSpeed)
863
+ - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch) (pytorch-1.11 w/ CUDA-11.5)
864
+ - **FP16 if applicable:** [apex](https://github.com/NVIDIA/apex)
865
+
866
+ # Evaluation
867
+
868
+ We refer to Table 7 from our [paper](https://arxiv.org/abs/2211.01786) & [bigscience/evaluation-results](https://huggingface.co/datasets/bigscience/evaluation-results) for zero-shot results on unseen tasks. The sidebar reports zero-shot performance of the best prompt per dataset config.
869
+
870
+ # Citation
871
+ ```bibtex
872
+ @misc{muennighoff2022crosslingual,
873
+ title={Crosslingual Generalization through Multitask Finetuning},
874
+ author={Niklas Muennighoff and Thomas Wang and Lintang Sutawika and Adam Roberts and Stella Biderman and Teven Le Scao and M Saiful Bari and Sheng Shen and Zheng-Xin Yong and Hailey Schoelkopf and Xiangru Tang and Dragomir Radev and Alham Fikri Aji and Khalid Almubarak and Samuel Albanie and Zaid Alyafeai and Albert Webson and Edward Raff and Colin Raffel},
875
+ year={2022},
876
+ eprint={2211.01786},
877
+ archivePrefix={arXiv},
878
+ primaryClass={cs.CL}
879
+ }
880
+ ```