RichardErkhov commited on
Commit
55fa171
β€’
1 Parent(s): 69779c4

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +2261 -0
README.md ADDED
@@ -0,0 +1,2261 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ bloom-3b - bnb 4bits
11
+ - Model creator: https://huggingface.co/bigscience/
12
+ - Original model: https://huggingface.co/bigscience/bloom-3b/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ license: bigscience-bloom-rail-1.0
20
+ language:
21
+ - ak
22
+ - ar
23
+ - as
24
+ - bm
25
+ - bn
26
+ - ca
27
+ - code
28
+ - en
29
+ - es
30
+ - eu
31
+ - fon
32
+ - fr
33
+ - gu
34
+ - hi
35
+ - id
36
+ - ig
37
+ - ki
38
+ - kn
39
+ - lg
40
+ - ln
41
+ - ml
42
+ - mr
43
+ - ne
44
+ - nso
45
+ - ny
46
+ - or
47
+ - pa
48
+ - pt
49
+ - rn
50
+ - rw
51
+ - sn
52
+ - st
53
+ - sw
54
+ - ta
55
+ - te
56
+ - tn
57
+ - ts
58
+ - tum
59
+ - tw
60
+ - ur
61
+ - vi
62
+ - wo
63
+ - xh
64
+ - yo
65
+ - zh
66
+ - zhs
67
+ - zht
68
+ - zu
69
+ pipeline_tag: text-generation
70
+ model-index:
71
+ - name: bloom
72
+ results:
73
+ - task:
74
+ type: text-generation
75
+ name: text generation
76
+ dataset:
77
+ name: arc_challenge
78
+ type: arc_challenge
79
+ metrics:
80
+ - name: acc
81
+ type: acc
82
+ value: 0.27986348122866894
83
+ verified: false
84
+ - task:
85
+ type: text-generation
86
+ name: text generation
87
+ dataset:
88
+ name: arc_easy
89
+ type: arc_easy
90
+ metrics:
91
+ - name: acc
92
+ type: acc
93
+ value: 0.5946969696969697
94
+ verified: false
95
+ - task:
96
+ type: text-generation
97
+ name: text generation
98
+ dataset:
99
+ name: axb
100
+ type: axb
101
+ metrics:
102
+ - name: acc
103
+ type: acc
104
+ value: 0.4433876811594203
105
+ verified: false
106
+ - task:
107
+ type: text-generation
108
+ name: text generation
109
+ dataset:
110
+ name: axg
111
+ type: axg
112
+ metrics:
113
+ - name: acc
114
+ type: acc
115
+ value: 0.5
116
+ verified: false
117
+ - task:
118
+ type: text-generation
119
+ name: text generation
120
+ dataset:
121
+ name: boolq
122
+ type: boolq
123
+ metrics:
124
+ - name: acc
125
+ type: acc
126
+ value: 0.6165137614678899
127
+ verified: false
128
+ - task:
129
+ type: text-generation
130
+ name: text generation
131
+ dataset:
132
+ name: cb
133
+ type: cb
134
+ metrics:
135
+ - name: acc
136
+ type: acc
137
+ value: 0.30357142857142855
138
+ verified: false
139
+ - task:
140
+ type: text-generation
141
+ name: text generation
142
+ dataset:
143
+ name: cola
144
+ type: cola
145
+ metrics:
146
+ - name: acc
147
+ type: acc
148
+ value: 0.610738255033557
149
+ verified: false
150
+ - task:
151
+ type: text-generation
152
+ name: text generation
153
+ dataset:
154
+ name: copa
155
+ type: copa
156
+ metrics:
157
+ - name: acc
158
+ type: acc
159
+ value: 0.63
160
+ verified: false
161
+ - task:
162
+ type: text-generation
163
+ name: text generation
164
+ dataset:
165
+ name: crows_pairs_english
166
+ type: crows_pairs_english
167
+ metrics:
168
+ - name: acc
169
+ type: acc
170
+ value: 0.4973166368515206
171
+ verified: false
172
+ - task:
173
+ type: text-generation
174
+ name: text generation
175
+ dataset:
176
+ name: crows_pairs_french
177
+ type: crows_pairs_french
178
+ metrics:
179
+ - name: acc
180
+ type: acc
181
+ value: 0.5032796660703638
182
+ verified: false
183
+ - task:
184
+ type: text-generation
185
+ name: text generation
186
+ dataset:
187
+ name: diabla
188
+ type: diabla
189
+ metrics:
190
+ - name: acc
191
+ type: acc
192
+ value: 0.28888308977035493
193
+ verified: false
194
+ - task:
195
+ type: text-generation
196
+ name: text generation
197
+ dataset:
198
+ name: gsarti/flores_101_afr
199
+ type: gsarti/flores_101_afr
200
+ metrics:
201
+ - name: byte_perplexity
202
+ type: byte_perplexity
203
+ value: 6.500798737976343
204
+ verified: false
205
+ - task:
206
+ type: text-generation
207
+ name: text generation
208
+ dataset:
209
+ name: gsarti/flores_101_amh
210
+ type: gsarti/flores_101_amh
211
+ metrics:
212
+ - name: byte_perplexity
213
+ type: byte_perplexity
214
+ value: 3.9726863338897145
215
+ verified: false
216
+ - task:
217
+ type: text-generation
218
+ name: text generation
219
+ dataset:
220
+ name: gsarti/flores_101_ara
221
+ type: gsarti/flores_101_ara
222
+ metrics:
223
+ - name: byte_perplexity
224
+ type: byte_perplexity
225
+ value: 1.8083841089875814
226
+ verified: false
227
+ - task:
228
+ type: text-generation
229
+ name: text generation
230
+ dataset:
231
+ name: gsarti/flores_101_asm
232
+ type: gsarti/flores_101_asm
233
+ metrics:
234
+ - name: byte_perplexity
235
+ type: byte_perplexity
236
+ value: 5.699102962086425
237
+ verified: false
238
+ - task:
239
+ type: text-generation
240
+ name: text generation
241
+ dataset:
242
+ name: gsarti/flores_101_ast
243
+ type: gsarti/flores_101_ast
244
+ metrics:
245
+ - name: byte_perplexity
246
+ type: byte_perplexity
247
+ value: 3.9252047073429384
248
+ verified: false
249
+ - task:
250
+ type: text-generation
251
+ name: text generation
252
+ dataset:
253
+ name: gsarti/flores_101_azj
254
+ type: gsarti/flores_101_azj
255
+ metrics:
256
+ - name: byte_perplexity
257
+ type: byte_perplexity
258
+ value: 6.942805054270002
259
+ verified: false
260
+ - task:
261
+ type: text-generation
262
+ name: text generation
263
+ dataset:
264
+ name: gsarti/flores_101_bel
265
+ type: gsarti/flores_101_bel
266
+ metrics:
267
+ - name: byte_perplexity
268
+ type: byte_perplexity
269
+ value: 3.614136245847082
270
+ verified: false
271
+ - task:
272
+ type: text-generation
273
+ name: text generation
274
+ dataset:
275
+ name: gsarti/flores_101_ben
276
+ type: gsarti/flores_101_ben
277
+ metrics:
278
+ - name: byte_perplexity
279
+ type: byte_perplexity
280
+ value: 5.121491534300969
281
+ verified: false
282
+ - task:
283
+ type: text-generation
284
+ name: text generation
285
+ dataset:
286
+ name: gsarti/flores_101_bos
287
+ type: gsarti/flores_101_bos
288
+ metrics:
289
+ - name: byte_perplexity
290
+ type: byte_perplexity
291
+ value: 5.653353469118798
292
+ verified: false
293
+ - task:
294
+ type: text-generation
295
+ name: text generation
296
+ dataset:
297
+ name: gsarti/flores_101_bul
298
+ type: gsarti/flores_101_bul
299
+ metrics:
300
+ - name: byte_perplexity
301
+ type: byte_perplexity
302
+ value: 2.7014693938055068
303
+ verified: false
304
+ - task:
305
+ type: text-generation
306
+ name: text generation
307
+ dataset:
308
+ name: gsarti/flores_101_cat
309
+ type: gsarti/flores_101_cat
310
+ metrics:
311
+ - name: byte_perplexity
312
+ type: byte_perplexity
313
+ value: 2.305190041967345
314
+ verified: false
315
+ - task:
316
+ type: text-generation
317
+ name: text generation
318
+ dataset:
319
+ name: gsarti/flores_101_ceb
320
+ type: gsarti/flores_101_ceb
321
+ metrics:
322
+ - name: byte_perplexity
323
+ type: byte_perplexity
324
+ value: 6.291000321323428
325
+ verified: false
326
+ - task:
327
+ type: text-generation
328
+ name: text generation
329
+ dataset:
330
+ name: gsarti/flores_101_ces
331
+ type: gsarti/flores_101_ces
332
+ metrics:
333
+ - name: byte_perplexity
334
+ type: byte_perplexity
335
+ value: 5.447322753586386
336
+ verified: false
337
+ - task:
338
+ type: text-generation
339
+ name: text generation
340
+ dataset:
341
+ name: gsarti/flores_101_ckb
342
+ type: gsarti/flores_101_ckb
343
+ metrics:
344
+ - name: byte_perplexity
345
+ type: byte_perplexity
346
+ value: 3.7255124939234765
347
+ verified: false
348
+ - task:
349
+ type: text-generation
350
+ name: text generation
351
+ dataset:
352
+ name: gsarti/flores_101_cym
353
+ type: gsarti/flores_101_cym
354
+ metrics:
355
+ - name: byte_perplexity
356
+ type: byte_perplexity
357
+ value: 12.539424151448149
358
+ verified: false
359
+ - task:
360
+ type: text-generation
361
+ name: text generation
362
+ dataset:
363
+ name: gsarti/flores_101_dan
364
+ type: gsarti/flores_101_dan
365
+ metrics:
366
+ - name: byte_perplexity
367
+ type: byte_perplexity
368
+ value: 5.183309001005672
369
+ verified: false
370
+ - task:
371
+ type: text-generation
372
+ name: text generation
373
+ dataset:
374
+ name: gsarti/flores_101_deu
375
+ type: gsarti/flores_101_deu
376
+ metrics:
377
+ - name: byte_perplexity
378
+ type: byte_perplexity
379
+ value: 3.1180422286591347
380
+ verified: false
381
+ - task:
382
+ type: text-generation
383
+ name: text generation
384
+ dataset:
385
+ name: gsarti/flores_101_ell
386
+ type: gsarti/flores_101_ell
387
+ metrics:
388
+ - name: byte_perplexity
389
+ type: byte_perplexity
390
+ value: 2.467943456164706
391
+ verified: false
392
+ - task:
393
+ type: text-generation
394
+ name: text generation
395
+ dataset:
396
+ name: gsarti/flores_101_eng
397
+ type: gsarti/flores_101_eng
398
+ metrics:
399
+ - name: byte_perplexity
400
+ type: byte_perplexity
401
+ value: 2.018740628193298
402
+ verified: false
403
+ - task:
404
+ type: text-generation
405
+ name: text generation
406
+ dataset:
407
+ name: gsarti/flores_101_est
408
+ type: gsarti/flores_101_est
409
+ metrics:
410
+ - name: byte_perplexity
411
+ type: byte_perplexity
412
+ value: 9.11654425176368
413
+ verified: false
414
+ - task:
415
+ type: text-generation
416
+ name: text generation
417
+ dataset:
418
+ name: gsarti/flores_101_fas
419
+ type: gsarti/flores_101_fas
420
+ metrics:
421
+ - name: byte_perplexity
422
+ type: byte_perplexity
423
+ value: 3.058009097116482
424
+ verified: false
425
+ - task:
426
+ type: text-generation
427
+ name: text generation
428
+ dataset:
429
+ name: gsarti/flores_101_fin
430
+ type: gsarti/flores_101_fin
431
+ metrics:
432
+ - name: byte_perplexity
433
+ type: byte_perplexity
434
+ value: 6.847047959628553
435
+ verified: false
436
+ - task:
437
+ type: text-generation
438
+ name: text generation
439
+ dataset:
440
+ name: gsarti/flores_101_fra
441
+ type: gsarti/flores_101_fra
442
+ metrics:
443
+ - name: byte_perplexity
444
+ type: byte_perplexity
445
+ value: 1.9975177011840075
446
+ verified: false
447
+ - task:
448
+ type: text-generation
449
+ name: text generation
450
+ dataset:
451
+ name: gsarti/flores_101_ful
452
+ type: gsarti/flores_101_ful
453
+ metrics:
454
+ - name: byte_perplexity
455
+ type: byte_perplexity
456
+ value: 11.465912731488828
457
+ verified: false
458
+ - task:
459
+ type: text-generation
460
+ name: text generation
461
+ dataset:
462
+ name: gsarti/flores_101_gle
463
+ type: gsarti/flores_101_gle
464
+ metrics:
465
+ - name: byte_perplexity
466
+ type: byte_perplexity
467
+ value: 8.681491663539422
468
+ verified: false
469
+ - task:
470
+ type: text-generation
471
+ name: text generation
472
+ dataset:
473
+ name: gsarti/flores_101_glg
474
+ type: gsarti/flores_101_glg
475
+ metrics:
476
+ - name: byte_perplexity
477
+ type: byte_perplexity
478
+ value: 3.029991089015508
479
+ verified: false
480
+ - task:
481
+ type: text-generation
482
+ name: text generation
483
+ dataset:
484
+ name: gsarti/flores_101_guj
485
+ type: gsarti/flores_101_guj
486
+ metrics:
487
+ - name: byte_perplexity
488
+ type: byte_perplexity
489
+ value: 4.955224230286231
490
+ verified: false
491
+ - task:
492
+ type: text-generation
493
+ name: text generation
494
+ dataset:
495
+ name: gsarti/flores_101_hau
496
+ type: gsarti/flores_101_hau
497
+ metrics:
498
+ - name: byte_perplexity
499
+ type: byte_perplexity
500
+ value: 10.758347356372159
501
+ verified: false
502
+ - task:
503
+ type: text-generation
504
+ name: text generation
505
+ dataset:
506
+ name: gsarti/flores_101_heb
507
+ type: gsarti/flores_101_heb
508
+ metrics:
509
+ - name: byte_perplexity
510
+ type: byte_perplexity
511
+ value: 3.6004478129801667
512
+ verified: false
513
+ - task:
514
+ type: text-generation
515
+ name: text generation
516
+ dataset:
517
+ name: gsarti/flores_101_hin
518
+ type: gsarti/flores_101_hin
519
+ metrics:
520
+ - name: byte_perplexity
521
+ type: byte_perplexity
522
+ value: 4.712530650588064
523
+ verified: false
524
+ - task:
525
+ type: text-generation
526
+ name: text generation
527
+ dataset:
528
+ name: gsarti/flores_101_hrv
529
+ type: gsarti/flores_101_hrv
530
+ metrics:
531
+ - name: byte_perplexity
532
+ type: byte_perplexity
533
+ value: 5.822418943372185
534
+ verified: false
535
+ - task:
536
+ type: text-generation
537
+ name: text generation
538
+ dataset:
539
+ name: gsarti/flores_101_hun
540
+ type: gsarti/flores_101_hun
541
+ metrics:
542
+ - name: byte_perplexity
543
+ type: byte_perplexity
544
+ value: 6.440482646965992
545
+ verified: false
546
+ - task:
547
+ type: text-generation
548
+ name: text generation
549
+ dataset:
550
+ name: gsarti/flores_101_hye
551
+ type: gsarti/flores_101_hye
552
+ metrics:
553
+ - name: byte_perplexity
554
+ type: byte_perplexity
555
+ value: 3.657718918347166
556
+ verified: false
557
+ - task:
558
+ type: text-generation
559
+ name: text generation
560
+ dataset:
561
+ name: gsarti/flores_101_ibo
562
+ type: gsarti/flores_101_ibo
563
+ metrics:
564
+ - name: byte_perplexity
565
+ type: byte_perplexity
566
+ value: 5.564814003872672
567
+ verified: false
568
+ - task:
569
+ type: text-generation
570
+ name: text generation
571
+ dataset:
572
+ name: gsarti/flores_101_ind
573
+ type: gsarti/flores_101_ind
574
+ metrics:
575
+ - name: byte_perplexity
576
+ type: byte_perplexity
577
+ value: 2.1597101468869373
578
+ verified: false
579
+ - task:
580
+ type: text-generation
581
+ name: text generation
582
+ dataset:
583
+ name: gsarti/flores_101_isl
584
+ type: gsarti/flores_101_isl
585
+ metrics:
586
+ - name: byte_perplexity
587
+ type: byte_perplexity
588
+ value: 8.082349269518136
589
+ verified: false
590
+ - task:
591
+ type: text-generation
592
+ name: text generation
593
+ dataset:
594
+ name: gsarti/flores_101_ita
595
+ type: gsarti/flores_101_ita
596
+ metrics:
597
+ - name: byte_perplexity
598
+ type: byte_perplexity
599
+ value: 2.9687591414176207
600
+ verified: false
601
+ - task:
602
+ type: text-generation
603
+ name: text generation
604
+ dataset:
605
+ name: gsarti/flores_101_jav
606
+ type: gsarti/flores_101_jav
607
+ metrics:
608
+ - name: byte_perplexity
609
+ type: byte_perplexity
610
+ value: 7.0573805415708994
611
+ verified: false
612
+ - task:
613
+ type: text-generation
614
+ name: text generation
615
+ dataset:
616
+ name: gsarti/flores_101_jpn
617
+ type: gsarti/flores_101_jpn
618
+ metrics:
619
+ - name: byte_perplexity
620
+ type: byte_perplexity
621
+ value: 2.7758864197116933
622
+ verified: false
623
+ - task:
624
+ type: text-generation
625
+ name: text generation
626
+ dataset:
627
+ name: gsarti/flores_101_kam
628
+ type: gsarti/flores_101_kam
629
+ metrics:
630
+ - name: byte_perplexity
631
+ type: byte_perplexity
632
+ value: 11.072949642861332
633
+ verified: false
634
+ - task:
635
+ type: text-generation
636
+ name: text generation
637
+ dataset:
638
+ name: gsarti/flores_101_kan
639
+ type: gsarti/flores_101_kan
640
+ metrics:
641
+ - name: byte_perplexity
642
+ type: byte_perplexity
643
+ value: 5.551730651007082
644
+ verified: false
645
+ - task:
646
+ type: text-generation
647
+ name: text generation
648
+ dataset:
649
+ name: gsarti/flores_101_kat
650
+ type: gsarti/flores_101_kat
651
+ metrics:
652
+ - name: byte_perplexity
653
+ type: byte_perplexity
654
+ value: 2.522630524283745
655
+ verified: false
656
+ - task:
657
+ type: text-generation
658
+ name: text generation
659
+ dataset:
660
+ name: gsarti/flores_101_kaz
661
+ type: gsarti/flores_101_kaz
662
+ metrics:
663
+ - name: byte_perplexity
664
+ type: byte_perplexity
665
+ value: 3.3901748516975574
666
+ verified: false
667
+ - task:
668
+ type: text-generation
669
+ name: text generation
670
+ dataset:
671
+ name: gsarti/flores_101_kea
672
+ type: gsarti/flores_101_kea
673
+ metrics:
674
+ - name: byte_perplexity
675
+ type: byte_perplexity
676
+ value: 8.918534182590863
677
+ verified: false
678
+ - task:
679
+ type: text-generation
680
+ name: text generation
681
+ dataset:
682
+ name: gsarti/flores_101_kir
683
+ type: gsarti/flores_101_kir
684
+ metrics:
685
+ - name: byte_perplexity
686
+ type: byte_perplexity
687
+ value: 3.729278369847201
688
+ verified: false
689
+ - task:
690
+ type: text-generation
691
+ name: text generation
692
+ dataset:
693
+ name: gsarti/flores_101_kor
694
+ type: gsarti/flores_101_kor
695
+ metrics:
696
+ - name: byte_perplexity
697
+ type: byte_perplexity
698
+ value: 3.932884847226212
699
+ verified: false
700
+ - task:
701
+ type: text-generation
702
+ name: text generation
703
+ dataset:
704
+ name: gsarti/flores_101_lao
705
+ type: gsarti/flores_101_lao
706
+ metrics:
707
+ - name: byte_perplexity
708
+ type: byte_perplexity
709
+ value: 2.9077314760849924
710
+ verified: false
711
+ - task:
712
+ type: text-generation
713
+ name: text generation
714
+ dataset:
715
+ name: gsarti/flores_101_lav
716
+ type: gsarti/flores_101_lav
717
+ metrics:
718
+ - name: byte_perplexity
719
+ type: byte_perplexity
720
+ value: 7.777221919194806
721
+ verified: false
722
+ - task:
723
+ type: text-generation
724
+ name: text generation
725
+ dataset:
726
+ name: gsarti/flores_101_lin
727
+ type: gsarti/flores_101_lin
728
+ metrics:
729
+ - name: byte_perplexity
730
+ type: byte_perplexity
731
+ value: 7.524842908050988
732
+ verified: false
733
+ - task:
734
+ type: text-generation
735
+ name: text generation
736
+ dataset:
737
+ name: gsarti/flores_101_lit
738
+ type: gsarti/flores_101_lit
739
+ metrics:
740
+ - name: byte_perplexity
741
+ type: byte_perplexity
742
+ value: 7.369179434621725
743
+ verified: false
744
+ - task:
745
+ type: text-generation
746
+ name: text generation
747
+ dataset:
748
+ name: gsarti/flores_101_ltz
749
+ type: gsarti/flores_101_ltz
750
+ metrics:
751
+ - name: byte_perplexity
752
+ type: byte_perplexity
753
+ value: 8.801059747949214
754
+ verified: false
755
+ - task:
756
+ type: text-generation
757
+ name: text generation
758
+ dataset:
759
+ name: gsarti/flores_101_lug
760
+ type: gsarti/flores_101_lug
761
+ metrics:
762
+ - name: byte_perplexity
763
+ type: byte_perplexity
764
+ value: 8.483203026364786
765
+ verified: false
766
+ - task:
767
+ type: text-generation
768
+ name: text generation
769
+ dataset:
770
+ name: gsarti/flores_101_luo
771
+ type: gsarti/flores_101_luo
772
+ metrics:
773
+ - name: byte_perplexity
774
+ type: byte_perplexity
775
+ value: 11.975963093623681
776
+ verified: false
777
+ - task:
778
+ type: text-generation
779
+ name: text generation
780
+ dataset:
781
+ name: gsarti/flores_101_mal
782
+ type: gsarti/flores_101_mal
783
+ metrics:
784
+ - name: byte_perplexity
785
+ type: byte_perplexity
786
+ value: 4.615948455160037
787
+ verified: false
788
+ - task:
789
+ type: text-generation
790
+ name: text generation
791
+ dataset:
792
+ name: gsarti/flores_101_mar
793
+ type: gsarti/flores_101_mar
794
+ metrics:
795
+ - name: byte_perplexity
796
+ type: byte_perplexity
797
+ value: 5.483253482821379
798
+ verified: false
799
+ - task:
800
+ type: text-generation
801
+ name: text generation
802
+ dataset:
803
+ name: gsarti/flores_101_mkd
804
+ type: gsarti/flores_101_mkd
805
+ metrics:
806
+ - name: byte_perplexity
807
+ type: byte_perplexity
808
+ value: 2.9656732291754087
809
+ verified: false
810
+ - task:
811
+ type: text-generation
812
+ name: text generation
813
+ dataset:
814
+ name: gsarti/flores_101_mlt
815
+ type: gsarti/flores_101_mlt
816
+ metrics:
817
+ - name: byte_perplexity
818
+ type: byte_perplexity
819
+ value: 15.004773437665275
820
+ verified: false
821
+ - task:
822
+ type: text-generation
823
+ name: text generation
824
+ dataset:
825
+ name: gsarti/flores_101_mon
826
+ type: gsarti/flores_101_mon
827
+ metrics:
828
+ - name: byte_perplexity
829
+ type: byte_perplexity
830
+ value: 3.410598542315402
831
+ verified: false
832
+ - task:
833
+ type: text-generation
834
+ name: text generation
835
+ dataset:
836
+ name: gsarti/flores_101_mri
837
+ type: gsarti/flores_101_mri
838
+ metrics:
839
+ - name: byte_perplexity
840
+ type: byte_perplexity
841
+ value: 7.474035895661322
842
+ verified: false
843
+ - task:
844
+ type: text-generation
845
+ name: text generation
846
+ dataset:
847
+ name: gsarti/flores_101_msa
848
+ type: gsarti/flores_101_msa
849
+ metrics:
850
+ - name: byte_perplexity
851
+ type: byte_perplexity
852
+ value: 2.5710001772665634
853
+ verified: false
854
+ - task:
855
+ type: text-generation
856
+ name: text generation
857
+ dataset:
858
+ name: gsarti/flores_101_mya
859
+ type: gsarti/flores_101_mya
860
+ metrics:
861
+ - name: byte_perplexity
862
+ type: byte_perplexity
863
+ value: 2.413577969878331
864
+ verified: false
865
+ - task:
866
+ type: text-generation
867
+ name: text generation
868
+ dataset:
869
+ name: gsarti/flores_101_nld
870
+ type: gsarti/flores_101_nld
871
+ metrics:
872
+ - name: byte_perplexity
873
+ type: byte_perplexity
874
+ value: 4.127831721885065
875
+ verified: false
876
+ - task:
877
+ type: text-generation
878
+ name: text generation
879
+ dataset:
880
+ name: gsarti/flores_101_nob
881
+ type: gsarti/flores_101_nob
882
+ metrics:
883
+ - name: byte_perplexity
884
+ type: byte_perplexity
885
+ value: 5.402763169129877
886
+ verified: false
887
+ - task:
888
+ type: text-generation
889
+ name: text generation
890
+ dataset:
891
+ name: gsarti/flores_101_npi
892
+ type: gsarti/flores_101_npi
893
+ metrics:
894
+ - name: byte_perplexity
895
+ type: byte_perplexity
896
+ value: 5.199342701937889
897
+ verified: false
898
+ - task:
899
+ type: text-generation
900
+ name: text generation
901
+ dataset:
902
+ name: gsarti/flores_101_nso
903
+ type: gsarti/flores_101_nso
904
+ metrics:
905
+ - name: byte_perplexity
906
+ type: byte_perplexity
907
+ value: 8.154626800955667
908
+ verified: false
909
+ - task:
910
+ type: text-generation
911
+ name: text generation
912
+ dataset:
913
+ name: gsarti/flores_101_nya
914
+ type: gsarti/flores_101_nya
915
+ metrics:
916
+ - name: byte_perplexity
917
+ type: byte_perplexity
918
+ value: 8.179860208369393
919
+ verified: false
920
+ - task:
921
+ type: text-generation
922
+ name: text generation
923
+ dataset:
924
+ name: gsarti/flores_101_oci
925
+ type: gsarti/flores_101_oci
926
+ metrics:
927
+ - name: byte_perplexity
928
+ type: byte_perplexity
929
+ value: 4.8617357393685845
930
+ verified: false
931
+ - task:
932
+ type: text-generation
933
+ name: text generation
934
+ dataset:
935
+ name: gsarti/flores_101_orm
936
+ type: gsarti/flores_101_orm
937
+ metrics:
938
+ - name: byte_perplexity
939
+ type: byte_perplexity
940
+ value: 12.911595421079408
941
+ verified: false
942
+ - task:
943
+ type: text-generation
944
+ name: text generation
945
+ dataset:
946
+ name: gsarti/flores_101_ory
947
+ type: gsarti/flores_101_ory
948
+ metrics:
949
+ - name: byte_perplexity
950
+ type: byte_perplexity
951
+ value: 5.189421861225964
952
+ verified: false
953
+ - task:
954
+ type: text-generation
955
+ name: text generation
956
+ dataset:
957
+ name: gsarti/flores_101_pan
958
+ type: gsarti/flores_101_pan
959
+ metrics:
960
+ - name: byte_perplexity
961
+ type: byte_perplexity
962
+ value: 4.698477289331806
963
+ verified: false
964
+ - task:
965
+ type: text-generation
966
+ name: text generation
967
+ dataset:
968
+ name: gsarti/flores_101_pol
969
+ type: gsarti/flores_101_pol
970
+ metrics:
971
+ - name: byte_perplexity
972
+ type: byte_perplexity
973
+ value: 4.625550458479643
974
+ verified: false
975
+ - task:
976
+ type: text-generation
977
+ name: text generation
978
+ dataset:
979
+ name: gsarti/flores_101_por
980
+ type: gsarti/flores_101_por
981
+ metrics:
982
+ - name: byte_perplexity
983
+ type: byte_perplexity
984
+ value: 1.9754515986213523
985
+ verified: false
986
+ - task:
987
+ type: text-generation
988
+ name: text generation
989
+ dataset:
990
+ name: gsarti/flores_101_pus
991
+ type: gsarti/flores_101_pus
992
+ metrics:
993
+ - name: byte_perplexity
994
+ type: byte_perplexity
995
+ value: 4.4963371422771585
996
+ verified: false
997
+ - task:
998
+ type: text-generation
999
+ name: text generation
1000
+ dataset:
1001
+ name: gsarti/flores_101_ron
1002
+ type: gsarti/flores_101_ron
1003
+ metrics:
1004
+ - name: byte_perplexity
1005
+ type: byte_perplexity
1006
+ value: 4.965456830031304
1007
+ verified: false
1008
+ - task:
1009
+ type: text-generation
1010
+ name: text generation
1011
+ dataset:
1012
+ name: gsarti/flores_101_rus
1013
+ type: gsarti/flores_101_rus
1014
+ metrics:
1015
+ - name: byte_perplexity
1016
+ type: byte_perplexity
1017
+ value: 2.0498020542445303
1018
+ verified: false
1019
+ - task:
1020
+ type: text-generation
1021
+ name: text generation
1022
+ dataset:
1023
+ name: gsarti/flores_101_slk
1024
+ type: gsarti/flores_101_slk
1025
+ metrics:
1026
+ - name: byte_perplexity
1027
+ type: byte_perplexity
1028
+ value: 6.450822127057479
1029
+ verified: false
1030
+ - task:
1031
+ type: text-generation
1032
+ name: text generation
1033
+ dataset:
1034
+ name: gsarti/flores_101_slv
1035
+ type: gsarti/flores_101_slv
1036
+ metrics:
1037
+ - name: byte_perplexity
1038
+ type: byte_perplexity
1039
+ value: 6.620252120186232
1040
+ verified: false
1041
+ - task:
1042
+ type: text-generation
1043
+ name: text generation
1044
+ dataset:
1045
+ name: gsarti/flores_101_sna
1046
+ type: gsarti/flores_101_sna
1047
+ metrics:
1048
+ - name: byte_perplexity
1049
+ type: byte_perplexity
1050
+ value: 8.462166771382726
1051
+ verified: false
1052
+ - task:
1053
+ type: text-generation
1054
+ name: text generation
1055
+ dataset:
1056
+ name: gsarti/flores_101_snd
1057
+ type: gsarti/flores_101_snd
1058
+ metrics:
1059
+ - name: byte_perplexity
1060
+ type: byte_perplexity
1061
+ value: 5.466066951221973
1062
+ verified: false
1063
+ - task:
1064
+ type: text-generation
1065
+ name: text generation
1066
+ dataset:
1067
+ name: gsarti/flores_101_som
1068
+ type: gsarti/flores_101_som
1069
+ metrics:
1070
+ - name: byte_perplexity
1071
+ type: byte_perplexity
1072
+ value: 11.95918054093392
1073
+ verified: false
1074
+ - task:
1075
+ type: text-generation
1076
+ name: text generation
1077
+ dataset:
1078
+ name: gsarti/flores_101_spa
1079
+ type: gsarti/flores_101_spa
1080
+ metrics:
1081
+ - name: byte_perplexity
1082
+ type: byte_perplexity
1083
+ value: 1.8965140104323535
1084
+ verified: false
1085
+ - task:
1086
+ type: text-generation
1087
+ name: text generation
1088
+ dataset:
1089
+ name: gsarti/flores_101_srp
1090
+ type: gsarti/flores_101_srp
1091
+ metrics:
1092
+ - name: byte_perplexity
1093
+ type: byte_perplexity
1094
+ value: 2.871214785885079
1095
+ verified: false
1096
+ - task:
1097
+ type: text-generation
1098
+ name: text generation
1099
+ dataset:
1100
+ name: gsarti/flores_101_swe
1101
+ type: gsarti/flores_101_swe
1102
+ metrics:
1103
+ - name: byte_perplexity
1104
+ type: byte_perplexity
1105
+ value: 5.054972008155866
1106
+ verified: false
1107
+ - task:
1108
+ type: text-generation
1109
+ name: text generation
1110
+ dataset:
1111
+ name: gsarti/flores_101_swh
1112
+ type: gsarti/flores_101_swh
1113
+ metrics:
1114
+ - name: byte_perplexity
1115
+ type: byte_perplexity
1116
+ value: 3.6973091886730676
1117
+ verified: false
1118
+ - task:
1119
+ type: text-generation
1120
+ name: text generation
1121
+ dataset:
1122
+ name: gsarti/flores_101_tam
1123
+ type: gsarti/flores_101_tam
1124
+ metrics:
1125
+ - name: byte_perplexity
1126
+ type: byte_perplexity
1127
+ value: 4.539493400469833
1128
+ verified: false
1129
+ - task:
1130
+ type: text-generation
1131
+ name: text generation
1132
+ dataset:
1133
+ name: gsarti/flores_101_tel
1134
+ type: gsarti/flores_101_tel
1135
+ metrics:
1136
+ - name: byte_perplexity
1137
+ type: byte_perplexity
1138
+ value: 5.807499987508966
1139
+ verified: false
1140
+ - task:
1141
+ type: text-generation
1142
+ name: text generation
1143
+ dataset:
1144
+ name: gsarti/flores_101_tgk
1145
+ type: gsarti/flores_101_tgk
1146
+ metrics:
1147
+ - name: byte_perplexity
1148
+ type: byte_perplexity
1149
+ value: 3.5994818827380426
1150
+ verified: false
1151
+ - task:
1152
+ type: text-generation
1153
+ name: text generation
1154
+ dataset:
1155
+ name: gsarti/flores_101_tgl
1156
+ type: gsarti/flores_101_tgl
1157
+ metrics:
1158
+ - name: byte_perplexity
1159
+ type: byte_perplexity
1160
+ value: 5.667053833119858
1161
+ verified: false
1162
+ - task:
1163
+ type: text-generation
1164
+ name: text generation
1165
+ dataset:
1166
+ name: gsarti/flores_101_tha
1167
+ type: gsarti/flores_101_tha
1168
+ metrics:
1169
+ - name: byte_perplexity
1170
+ type: byte_perplexity
1171
+ value: 2.365940201944242
1172
+ verified: false
1173
+ - task:
1174
+ type: text-generation
1175
+ name: text generation
1176
+ dataset:
1177
+ name: gsarti/flores_101_tur
1178
+ type: gsarti/flores_101_tur
1179
+ metrics:
1180
+ - name: byte_perplexity
1181
+ type: byte_perplexity
1182
+ value: 4.885014749844601
1183
+ verified: false
1184
+ - task:
1185
+ type: text-generation
1186
+ name: text generation
1187
+ dataset:
1188
+ name: gsarti/flores_101_ukr
1189
+ type: gsarti/flores_101_ukr
1190
+ metrics:
1191
+ - name: byte_perplexity
1192
+ type: byte_perplexity
1193
+ value: 2.7240934990288483
1194
+ verified: false
1195
+ - task:
1196
+ type: text-generation
1197
+ name: text generation
1198
+ dataset:
1199
+ name: gsarti/flores_101_umb
1200
+ type: gsarti/flores_101_umb
1201
+ metrics:
1202
+ - name: byte_perplexity
1203
+ type: byte_perplexity
1204
+ value: 12.766915508610673
1205
+ verified: false
1206
+ - task:
1207
+ type: text-generation
1208
+ name: text generation
1209
+ dataset:
1210
+ name: gsarti/flores_101_urd
1211
+ type: gsarti/flores_101_urd
1212
+ metrics:
1213
+ - name: byte_perplexity
1214
+ type: byte_perplexity
1215
+ value: 1.9797467071381232
1216
+ verified: false
1217
+ - task:
1218
+ type: text-generation
1219
+ name: text generation
1220
+ dataset:
1221
+ name: gsarti/flores_101_uzb
1222
+ type: gsarti/flores_101_uzb
1223
+ metrics:
1224
+ - name: byte_perplexity
1225
+ type: byte_perplexity
1226
+ value: 12.002337637722146
1227
+ verified: false
1228
+ - task:
1229
+ type: text-generation
1230
+ name: text generation
1231
+ dataset:
1232
+ name: gsarti/flores_101_vie
1233
+ type: gsarti/flores_101_vie
1234
+ metrics:
1235
+ - name: byte_perplexity
1236
+ type: byte_perplexity
1237
+ value: 1.76578415476397
1238
+ verified: false
1239
+ - task:
1240
+ type: text-generation
1241
+ name: text generation
1242
+ dataset:
1243
+ name: gsarti/flores_101_wol
1244
+ type: gsarti/flores_101_wol
1245
+ metrics:
1246
+ - name: byte_perplexity
1247
+ type: byte_perplexity
1248
+ value: 9.144285650306488
1249
+ verified: false
1250
+ - task:
1251
+ type: text-generation
1252
+ name: text generation
1253
+ dataset:
1254
+ name: gsarti/flores_101_xho
1255
+ type: gsarti/flores_101_xho
1256
+ metrics:
1257
+ - name: byte_perplexity
1258
+ type: byte_perplexity
1259
+ value: 7.403240538286952
1260
+ verified: false
1261
+ - task:
1262
+ type: text-generation
1263
+ name: text generation
1264
+ dataset:
1265
+ name: gsarti/flores_101_yor
1266
+ type: gsarti/flores_101_yor
1267
+ metrics:
1268
+ - name: byte_perplexity
1269
+ type: byte_perplexity
1270
+ value: 5.91272037551173
1271
+ verified: false
1272
+ - task:
1273
+ type: text-generation
1274
+ name: text generation
1275
+ dataset:
1276
+ name: gsarti/flores_101_zho_simpl
1277
+ type: gsarti/flores_101_zho_simpl
1278
+ metrics:
1279
+ - name: byte_perplexity
1280
+ type: byte_perplexity
1281
+ value: 2.2769070822768533
1282
+ verified: false
1283
+ - task:
1284
+ type: text-generation
1285
+ name: text generation
1286
+ dataset:
1287
+ name: gsarti/flores_101_zho_trad
1288
+ type: gsarti/flores_101_zho_trad
1289
+ metrics:
1290
+ - name: byte_perplexity
1291
+ type: byte_perplexity
1292
+ value: 2.5180582198242383
1293
+ verified: false
1294
+ - task:
1295
+ type: text-generation
1296
+ name: text generation
1297
+ dataset:
1298
+ name: gsarti/flores_101_zul
1299
+ type: gsarti/flores_101_zul
1300
+ metrics:
1301
+ - name: byte_perplexity
1302
+ type: byte_perplexity
1303
+ value: 8.53353320693145
1304
+ verified: false
1305
+ - task:
1306
+ type: text-generation
1307
+ name: text generation
1308
+ dataset:
1309
+ name: headqa
1310
+ type: headqa
1311
+ metrics:
1312
+ - name: acc
1313
+ type: acc
1314
+ value: 0.26440554339897887
1315
+ verified: false
1316
+ - task:
1317
+ type: text-generation
1318
+ name: text generation
1319
+ dataset:
1320
+ name: hellaswag
1321
+ type: hellaswag
1322
+ metrics:
1323
+ - name: acc
1324
+ type: acc
1325
+ value: 0.41236805417247563
1326
+ verified: false
1327
+ - task:
1328
+ type: text-generation
1329
+ name: text generation
1330
+ dataset:
1331
+ name: logiqa
1332
+ type: logiqa
1333
+ metrics:
1334
+ - name: acc
1335
+ type: acc
1336
+ value: 0.2073732718894009
1337
+ verified: false
1338
+ - task:
1339
+ type: text-generation
1340
+ name: text generation
1341
+ dataset:
1342
+ name: mathqa
1343
+ type: mathqa
1344
+ metrics:
1345
+ - name: acc
1346
+ type: acc
1347
+ value: 0.24958123953098826
1348
+ verified: false
1349
+ - task:
1350
+ type: text-generation
1351
+ name: text generation
1352
+ dataset:
1353
+ name: mc_taco
1354
+ type: mc_taco
1355
+ metrics:
1356
+ - name: em
1357
+ type: em
1358
+ value: 0.11936936936936937
1359
+ verified: false
1360
+ - task:
1361
+ type: text-generation
1362
+ name: text generation
1363
+ dataset:
1364
+ name: mnli
1365
+ type: mnli
1366
+ metrics:
1367
+ - name: acc
1368
+ type: acc
1369
+ value: 0.35496688741721855
1370
+ verified: false
1371
+ - task:
1372
+ type: text-generation
1373
+ name: text generation
1374
+ dataset:
1375
+ name: mnli_mismatched
1376
+ type: mnli_mismatched
1377
+ metrics:
1378
+ - name: acc
1379
+ type: acc
1380
+ value: 0.35211554109031734
1381
+ verified: false
1382
+ - task:
1383
+ type: text-generation
1384
+ name: text generation
1385
+ dataset:
1386
+ name: mrpc
1387
+ type: mrpc
1388
+ metrics:
1389
+ - name: acc
1390
+ type: acc
1391
+ value: 0.5857843137254902
1392
+ verified: false
1393
+ - task:
1394
+ type: text-generation
1395
+ name: text generation
1396
+ dataset:
1397
+ name: multirc
1398
+ type: multirc
1399
+ metrics:
1400
+ - name: acc
1401
+ type: acc
1402
+ value: 0.5375412541254125
1403
+ verified: false
1404
+ - task:
1405
+ type: text-generation
1406
+ name: text generation
1407
+ dataset:
1408
+ name: openbookqa
1409
+ type: openbookqa
1410
+ metrics:
1411
+ - name: acc
1412
+ type: acc
1413
+ value: 0.216
1414
+ verified: false
1415
+ - task:
1416
+ type: text-generation
1417
+ name: text generation
1418
+ dataset:
1419
+ name: piqa
1420
+ type: piqa
1421
+ metrics:
1422
+ - name: acc
1423
+ type: acc
1424
+ value: 0.7078346028291621
1425
+ verified: false
1426
+ - task:
1427
+ type: text-generation
1428
+ name: text generation
1429
+ dataset:
1430
+ name: prost
1431
+ type: prost
1432
+ metrics:
1433
+ - name: acc
1434
+ type: acc
1435
+ value: 0.22683603757472245
1436
+ verified: false
1437
+ - task:
1438
+ type: text-generation
1439
+ name: text generation
1440
+ dataset:
1441
+ name: pubmedqa
1442
+ type: pubmedqa
1443
+ metrics:
1444
+ - name: acc
1445
+ type: acc
1446
+ value: 0.616
1447
+ verified: false
1448
+ - task:
1449
+ type: text-generation
1450
+ name: text generation
1451
+ dataset:
1452
+ name: qnli
1453
+ type: qnli
1454
+ metrics:
1455
+ - name: acc
1456
+ type: acc
1457
+ value: 0.5072304594545122
1458
+ verified: false
1459
+ - task:
1460
+ type: text-generation
1461
+ name: text generation
1462
+ dataset:
1463
+ name: qqp
1464
+ type: qqp
1465
+ metrics:
1466
+ - name: acc
1467
+ type: acc
1468
+ value: 0.3842443729903537
1469
+ verified: false
1470
+ - task:
1471
+ type: text-generation
1472
+ name: text generation
1473
+ dataset:
1474
+ name: race
1475
+ type: race
1476
+ metrics:
1477
+ - name: acc
1478
+ type: acc
1479
+ value: 0.3521531100478469
1480
+ verified: false
1481
+ - task:
1482
+ type: text-generation
1483
+ name: text generation
1484
+ dataset:
1485
+ name: rte
1486
+ type: rte
1487
+ metrics:
1488
+ - name: acc
1489
+ type: acc
1490
+ value: 0.47653429602888087
1491
+ verified: false
1492
+ - task:
1493
+ type: text-generation
1494
+ name: text generation
1495
+ dataset:
1496
+ name: sciq
1497
+ type: sciq
1498
+ metrics:
1499
+ - name: acc
1500
+ type: acc
1501
+ value: 0.892
1502
+ verified: false
1503
+ - task:
1504
+ type: text-generation
1505
+ name: text generation
1506
+ dataset:
1507
+ name: sst
1508
+ type: sst
1509
+ metrics:
1510
+ - name: acc
1511
+ type: acc
1512
+ value: 0.5177752293577982
1513
+ verified: false
1514
+ - task:
1515
+ type: text-generation
1516
+ name: text generation
1517
+ dataset:
1518
+ name: triviaqa
1519
+ type: triviaqa
1520
+ metrics:
1521
+ - name: acc
1522
+ type: acc
1523
+ value: 0.041633518960487934
1524
+ verified: false
1525
+ - task:
1526
+ type: text-generation
1527
+ name: text generation
1528
+ dataset:
1529
+ name: tydiqa_primary
1530
+ type: tydiqa_primary
1531
+ metrics:
1532
+ - name: acc
1533
+ type: acc
1534
+ value: 0.3011337608795236
1535
+ verified: false
1536
+ - task:
1537
+ type: text-generation
1538
+ name: text generation
1539
+ dataset:
1540
+ name: webqs
1541
+ type: webqs
1542
+ metrics:
1543
+ - name: acc
1544
+ type: acc
1545
+ value: 0.01673228346456693
1546
+ verified: false
1547
+ - task:
1548
+ type: text-generation
1549
+ name: text generation
1550
+ dataset:
1551
+ name: wic
1552
+ type: wic
1553
+ metrics:
1554
+ - name: acc
1555
+ type: acc
1556
+ value: 0.5015673981191222
1557
+ verified: false
1558
+ - task:
1559
+ type: text-generation
1560
+ name: text generation
1561
+ dataset:
1562
+ name: winogrande
1563
+ type: winogrande
1564
+ metrics:
1565
+ - name: acc
1566
+ type: acc
1567
+ value: 0.5864246250986582
1568
+ verified: false
1569
+ - task:
1570
+ type: text-generation
1571
+ name: text generation
1572
+ dataset:
1573
+ name: wnli
1574
+ type: wnli
1575
+ metrics:
1576
+ - name: acc
1577
+ type: acc
1578
+ value: 0.471830985915493
1579
+ verified: false
1580
+ - task:
1581
+ type: text-generation
1582
+ name: text generation
1583
+ dataset:
1584
+ name: wsc
1585
+ type: wsc
1586
+ metrics:
1587
+ - name: acc
1588
+ type: acc
1589
+ value: 0.4423076923076923
1590
+ verified: false
1591
+ - task:
1592
+ type: text-generation
1593
+ name: text generation
1594
+ dataset:
1595
+ name: humaneval
1596
+ type: humaneval
1597
+ metrics:
1598
+ - name: pass@1
1599
+ type: pass@1
1600
+ value: 0.15524390243902436
1601
+ verified: false
1602
+ - name: pass@10
1603
+ type: pass@10
1604
+ value: 0.3220367632383857
1605
+ verified: false
1606
+ - name: pass@100
1607
+ type: pass@100
1608
+ value: 0.5545431515723145
1609
+ verified: false
1610
+ ---
1611
+
1612
+ <h1 style='text-align: center '>BLOOM LM</h1>
1613
+ <h2 style='text-align: center '><em>BigScience Large Open-science Open-access Multilingual Language Model</em> </h2>
1614
+ <h3 style='text-align: center '>Model Card</h3>
1615
+ <img src="https://s3.amazonaws.com/moonup/production/uploads/1657124309515-5f17f0a0925b9863e28ad517.png" alt="BigScience Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
1616
+
1617
+ Version 1.0 / 26.May.2022
1618
+
1619
+ ## Table of Contents
1620
+ 1. [Model Details](#model-details)
1621
+ 2. [Uses](#uses)
1622
+ 3. [Training Data](#training-data)
1623
+ 4. [Risks and Limitations](#risks-and-limitations)
1624
+ 5. [Evaluation](#evaluation)
1625
+ 6. [Recommendations](#recommendations)
1626
+ 7. [Glossary and Calculations](#glossary-and-calculations)
1627
+ 8. [More Information](#more-information)
1628
+ 9. [Model Card Authors](#model-card-authors)
1629
+
1630
+ ## Model Details
1631
+
1632
+ ### Basics
1633
+ *This section provides information for anyone who wants to know about the model.*
1634
+
1635
+ <details>
1636
+ <summary>Click to expand</summary> <br/>
1637
+
1638
+ **Developed by:** BigScience ([website](https://bigscience.huggingface.co))
1639
+
1640
+ * All collaborators are either volunteers or have an agreement with their employer. *(Further breakdown of participants forthcoming.)*
1641
+
1642
+ **Model Type:** Transformer-based Language Model
1643
+
1644
+ **Version:** 1.0.0
1645
+
1646
+ **Languages:** Multiple; see [training data](#training-data)
1647
+
1648
+ **License:** RAIL License v1.0 ([link](https://huggingface.co/spaces/bigscience/license))
1649
+
1650
+ **Release Date Estimate:** Monday, 11.July.2022
1651
+
1652
+ **Send Questions to:** bigscience-contact@googlegroups.com
1653
+
1654
+ **Cite as:** BigScience, _BigScience Language Open-science Open-access Multilingual (BLOOM) Language Model_. International, May 2021-May 2022
1655
+
1656
+ **Funded by:**
1657
+
1658
+ * The French government.
1659
+
1660
+ * Hugging Face ([website](https://huggingface.co)).
1661
+
1662
+ * Organizations of contributors. *(Further breakdown of organizations forthcoming.)*
1663
+
1664
+ </details>
1665
+
1666
+ ### Technical Specifications
1667
+ *This section provides information for people who work on model development.*
1668
+
1669
+ <details>
1670
+ <summary>Click to expand</summary><br/>
1671
+
1672
+ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml#readme) for full details on replicating training.
1673
+
1674
+ **Model Architecture:** Modified from Megatron-LM GPT2 (see [paper](https://arxiv.org/abs/1909.08053), [BLOOM Megatron code](https://github.com/bigscience-workshop/Megatron-DeepSpeed)):
1675
+
1676
+ * Decoder-only architecture
1677
+
1678
+ * Layer normalization applied to word embeddings layer (`StableEmbedding`; see [code](https://github.com/facebookresearch/bitsandbytes), [paper](https://arxiv.org/pdf/2110.02861.pdf))
1679
+
1680
+ * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
1681
+
1682
+ * 3,002,557,440 parameters:
1683
+
1684
+ * 642,252,800 embedding parameters
1685
+
1686
+ * 30 layers, 32 attention heads
1687
+
1688
+ * Hidden layers are 2560-dimensional
1689
+
1690
+ * Sequence length of 2048 tokens used (see [BLOOM tokenizer](https://huggingface.co/bigscience/tokenizer), [tokenizer description](#tokenization))
1691
+
1692
+ **Objective Function:** Cross Entropy with mean reduction (see [API documentation](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)).
1693
+
1694
+ **Compute infrastructure:** Jean Zay Public Supercomputer, provided by the French government (see [announcement](https://www.enseignementsup-recherche.gouv.fr/fr/signature-du-marche-d-acquisition-de-l-un-des-supercalculateurs-les-plus-puissants-d-europe-46733)).
1695
+
1696
+ * Hardware: 384 A100 80GB GPUs (48 nodes):
1697
+
1698
+ * Additional 32 A100 80GB GPUs (4 nodes) in reserve
1699
+
1700
+ * 8 GPUs per node Using NVLink 4 inter-gpu connects, 4 OmniPath links
1701
+
1702
+ * CPU: AMD
1703
+
1704
+ * CPU memory: 512GB per node
1705
+
1706
+ * GPU memory: 640GB per node
1707
+
1708
+ * Inter-node connect: Omni-Path Architecture (OPA)
1709
+
1710
+ * NCCL-communications network: a fully dedicated subnet
1711
+
1712
+ * Disc IO network: shared network with other types of nodes
1713
+
1714
+ * Software:
1715
+
1716
+ * Megatron-DeepSpeed ([Github link](https://github.com/bigscience-workshop/Megatron-DeepSpeed))
1717
+
1718
+ * DeepSpeed ([Github link](https://github.com/microsoft/DeepSpeed))
1719
+
1720
+ * PyTorch (pytorch-1.11 w/ CUDA-11.5; see [Github link](https://github.com/pytorch/pytorch))
1721
+
1722
+ * apex ([Github link](https://github.com/NVIDIA/apex))
1723
+
1724
+
1725
+ #### **Training**
1726
+
1727
+ Training logs: [Tensorboard link](https://huggingface.co/tensorboard/bigscience/tr11c-2B5-logs)
1728
+
1729
+ - Number of epochs: 1 (*current target*)
1730
+
1731
+ - Dates:
1732
+
1733
+ - Started 11th March, 2022 11:42am PST
1734
+
1735
+ - Ended 5th July, 2022
1736
+
1737
+ - Estimated cost of training: Equivalent of $2-5M in cloud computing (including preliminary experiments)
1738
+
1739
+ - Server training location: Île-de-France, France
1740
+
1741
+ #### **Tokenization**
1742
+
1743
+ The BLOOM tokenizer ([link](https://huggingface.co/bigscience/tokenizer)) is a learned subword tokenizer trained using:
1744
+
1745
+ - A byte-level Byte Pair Encoding (BPE) algorithm
1746
+
1747
+ - A simple pre-tokenization rule, no normalization
1748
+
1749
+ - A vocabulary size of 250,680
1750
+
1751
+ It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language.
1752
+
1753
+ </details>
1754
+
1755
+
1756
+ ### Environmental Impact
1757
+
1758
+ <details>
1759
+ <summary>Click to expand</summary><br/>
1760
+
1761
+ The training supercomputer, Jean Zay ([website](http://www.idris.fr/eng/jean-zay/jean-zay-presentation-eng.html)), uses mostly nuclear energy. The heat generated by it is reused for heating campus housing.
1762
+
1763
+ **Estimated carbon emissions:** *(Forthcoming upon completion of training.)*
1764
+
1765
+ **Estimated electricity usage:** *(Forthcoming upon completion of training.)*
1766
+
1767
+
1768
+ </details>
1769
+ <p>&nbsp;</p>
1770
+
1771
+ ## Uses
1772
+
1773
+ *This section addresses questions around how the model is intended to be used, discusses the foreseeable users of the model (including those affected by the model), and describes uses that are considered out of scope or misuse of the model.
1774
+ It provides information for anyone considering using the model or who is affected by the model.*
1775
+
1776
+
1777
+ <details>
1778
+ <summary>Click to expand</summary><br/>
1779
+
1780
+ ### Intended Use
1781
+
1782
+ This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pretrained base model that can be further fine-tuned for specific tasks. Use cases below are not exhaustive.
1783
+
1784
+ #### **Direct Use**
1785
+
1786
+ - Text generation
1787
+
1788
+ - Exploring characteristics of language generated by a language model
1789
+
1790
+ - Examples: Cloze tests, counterfactuals, generations with reframings
1791
+
1792
+ #### **Downstream Use**
1793
+
1794
+ - Tasks that leverage language models include: Information Extraction, Question Answering, Summarization
1795
+
1796
+ ### Misuse and Out-of-scope Use
1797
+ *This section addresses what users ought not do with the model.*
1798
+
1799
+ See the [BLOOM License](https://huggingface.co/spaces/bigscience/license), Attachment A, for detailed usage restrictions. The below list is non-exhaustive, but lists some easily foreseeable problematic use cases.
1800
+
1801
+ #### **Out-of-scope Uses**
1802
+
1803
+ Using the model in [high-stakes](#high-stakes) settings is out of scope for this model.Β  The model is not designed for [critical decisions](#critical-decisions) nor uses with any material consequences on an individual's livelihood or wellbeing. The model outputs content that appears factual but is not correct.
1804
+
1805
+ ##### Out-of-scope Uses Include:
1806
+
1807
+ - Usage in biomedical domains, political and legal domains, or finance domains
1808
+
1809
+ - Usage for evaluating or scoring individuals, such as for employment, education, or credit
1810
+
1811
+ - Applying the model for critical automatic decisions, generating factual content, creating reliable summaries, or generating predictions that must be correct
1812
+
1813
+ #### **Misuse**
1814
+
1815
+ Intentionally using the model for harm, violating [human rights](#human-rights), or other kinds of malicious activities, is a misuse of this model. This includes:
1816
+
1817
+ - Spam generation
1818
+
1819
+ - Disinformation and influence operations
1820
+
1821
+ - Disparagement and defamation
1822
+
1823
+ - Harassment and abuse
1824
+
1825
+ - [Deception](#deception)
1826
+
1827
+ - Unconsented impersonation and imitation
1828
+
1829
+ - Unconsented surveillance
1830
+
1831
+ - Generating content without attribution to the model, as specified in the [RAIL License, Use Restrictions](https://huggingface.co/spaces/bigscience/license)
1832
+
1833
+ ### Intended Users
1834
+
1835
+ #### **Direct Users**
1836
+
1837
+ - General Public
1838
+
1839
+ - Researchers
1840
+
1841
+ - Students
1842
+
1843
+ - Educators
1844
+
1845
+ - Engineers/developers
1846
+
1847
+ - Non-commercial entities
1848
+
1849
+ - Community advocates, including human and civil rights groups
1850
+
1851
+ #### Indirect Users
1852
+
1853
+ - Users of derivatives created by Direct Users, such as those using software with an [intended use](#intended-use)
1854
+
1855
+ - Users of [Derivatives of the Model, as described in the License](https://huggingface.co/spaces/bigscience/license)
1856
+
1857
+ #### Others Affected (Parties Prenantes)
1858
+
1859
+ - People and groups referred to by the LLM
1860
+
1861
+ - People and groups exposed to outputs of, or decisions based on, the LLM
1862
+
1863
+ - People and groups whose original work is included in the LLM
1864
+
1865
+ </details>
1866
+ <p>&nbsp;</p>
1867
+
1868
+ ## Training Data
1869
+ *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
1870
+
1871
+
1872
+ <details>
1873
+ <summary>Click to expand</summary><br/>
1874
+
1875
+ Details for each dataset are provided in individual [Data Cards](https://huggingface.co/spaces/bigscience/BigScienceCorpus).
1876
+
1877
+ Training data includes:
1878
+
1879
+ - 45 natural languages
1880
+
1881
+ - 12 programming languages
1882
+
1883
+ - In 1.5TB of pre-processed text, converted into 350B unique tokens (see [the tokenizer section](#tokenization) for more.)
1884
+
1885
+
1886
+ #### **Languages**
1887
+
1888
+ The pie chart shows the distribution of languages in training data.
1889
+
1890
+ ![pie chart showing the distribution of languages in training data](https://github.com/bigscience-workshop/model_card/blob/main/assets/data/pie_chart.svg?raw=true)
1891
+
1892
+
1893
+ The following table shows the further distribution of Niger-Congo and Indic languages in the training data.
1894
+ <details>
1895
+ <summary>Click to expand</summary><br/>
1896
+
1897
+ | Niger Congo | Percentage | | Indic | Percentage |
1898
+ |----------------|------------ |------ |-----------|------------|
1899
+ | Chi Tumbuka | 0.00002 | | Assamese | 0.01 |
1900
+ | Kikuyu | 0.00004 | | Odia | 0.04 |
1901
+ | Bambara | 0.00004 | | Gujarati | 0.04 |
1902
+ | Akan | 0.00007 | | Marathi | 0.05 |
1903
+ | Xitsonga | 0.00007 | | Punjabi | 0.05 |
1904
+ | Sesotho | 0.00007 | | Kannada | 0.06 |
1905
+ | Chi Chewa | 0.0001 | | Nepali | 0.07 |
1906
+ | Setswana | 0.0002 | | Telugu | 0.09 |
1907
+ | Northern Sotho | 0.0002 | | Malayalam | 0.10 |
1908
+ | Fon | 0.0002 | | Urdu | 0.10 |
1909
+ | Kirundi | 0.0003 | | Tamil | 0.20 |
1910
+ | Wolof | 0.0004 | | Bengali | 0.50 |
1911
+ | Kuganda | 0.0004 | | Hindi | 0.70 |
1912
+ | Chi Shona | 0.001 |
1913
+ | Isi Zulu | 0.001 |
1914
+ | Igbo | 0.001 |
1915
+ | Xhosa | 0.001 |
1916
+ | Kinyarwanda | 0.003 |
1917
+ | Yoruba | 0.006 |
1918
+ | Swahili | 0.02 |
1919
+ </details>
1920
+
1921
+ The following table shows the distribution of programming languages.
1922
+ <details>
1923
+ <summary>Click to expand</summary><br/>
1924
+
1925
+ | Extension | Language | Number of files |
1926
+ |----------------|------------|-----------------|
1927
+ | java | Java | 5,407,724 |
1928
+ | php | PHP | 4,942,186 |
1929
+ | cpp | C++ | 2,503,930 |
1930
+ | py | Python | 2,435,072 |
1931
+ | js | JavaScript | 1,905,518 |
1932
+ | cs | C# | 1,577,347 |
1933
+ | rb | Ruby | 6,78,413 |
1934
+ | cc | C++ | 443,054 |
1935
+ | hpp | C++ | 391,048 |
1936
+ | lua | Lua | 352,317 |
1937
+ | go | GO | 227,763 |
1938
+ | ts | TypeScript | 195,254 |
1939
+ | C | C | 134,537 |
1940
+ | scala | Scala | 92,052 |
1941
+ | hh | C++ | 67,161 |
1942
+ | H | C++ | 55,899 |
1943
+ | tsx | TypeScript | 33,107 |
1944
+ | rs | Rust | 29,693 |
1945
+ | phpt | PHP | 9,702 |
1946
+ | c++ | C++ | 1,342 |
1947
+ | h++ | C++ | 791 |
1948
+ | php3 | PHP | 540 |
1949
+ | phps | PHP | 270 |
1950
+ | php5 | PHP | 166 |
1951
+ | php4 | PHP | 29 |
1952
+
1953
+ </details>
1954
+ </details>
1955
+ <p>&nbsp;</p>
1956
+
1957
+ ## Risks and Limitations
1958
+ *This section identifies foreseeable harms and misunderstandings.*
1959
+
1960
+ <details>
1961
+ <summary>Click to expand</summary><br/>
1962
+
1963
+ Model may:
1964
+
1965
+ - Overrepresent some viewpoints and underrepresent others
1966
+
1967
+ - Contain stereotypes
1968
+
1969
+ - Contain [personal information](#personal-data-and-information)
1970
+
1971
+ - Generate:
1972
+
1973
+ - Hateful, abusive, or violent language
1974
+
1975
+ - Discriminatory or prejudicial language
1976
+
1977
+ - Content that may not be appropriate for all settings, including sexual content
1978
+
1979
+ - Make errors, including producing incorrect information as if it were factual
1980
+
1981
+ - Generate irrelevant or repetitive outputs
1982
+ </details>
1983
+ <p>&nbsp;</p>
1984
+
1985
+ ## Evaluation
1986
+ *This section describes the evaluation protocols and provides the results.*
1987
+
1988
+ <details>
1989
+ <summary>Click to expand</summary><br/>
1990
+
1991
+ ### Metrics
1992
+ *This section describes the different ways performance is calculated and why.*
1993
+
1994
+ Includes:
1995
+
1996
+ | Metric | Why chosen |
1997
+ |--------------------|--------------------------------------------------------------------|
1998
+ | [Perplexity](#perplexity) | Standard metric for quantifying model improvements during training |
1999
+ | Cross Entropy [Loss](#loss) | Standard objective for language models. |
2000
+
2001
+ And multiple different metrics for specific tasks. _(More evaluation metrics forthcoming upon completion of evaluation protocol.)_
2002
+
2003
+ ### Factors
2004
+ *This section lists some different aspects of BLOOM models. Its focus is on aspects that are likely to give rise to high variance in model behavior.*
2005
+
2006
+ - Language, such as English or Yoruba
2007
+
2008
+ - Domain, such as newswire or stories
2009
+
2010
+ - Demographic characteristics, such as gender or nationality
2011
+
2012
+ ### Results
2013
+ *Results are based on the [Factors](#factors) and [Metrics](#metrics).*
2014
+
2015
+ **Zero-shot evaluations:**
2016
+
2017
+ See this repository for JSON files: https://github.com/bigscience-workshop/evaluation-results
2018
+
2019
+ | Task | Language | Metric | BLOOM-2B5 |
2020
+ |:----|:----|:----|:----:|
2021
+ | arc_challenge | eng | acc ↑ | 0.28 |
2022
+ | arc_easy | eng | acc ↑ | 0.595 |
2023
+ | axb (Median of 10 prompts) | eng | acc ↑ | 0.443 |
2024
+ | axg (Median of 10 prompts) | eng | acc ↑ | 0.5 |
2025
+ | boolq (Median of 11 prompts) | eng | acc ↑ | 0.617 |
2026
+ | cb (Median of 15 prompts) | eng | acc ↑ | 0.304 |
2027
+ | cola (Median of 5 prompts) | eng | acc ↑ | 0.611 |
2028
+ | copa (Median of 9 prompts) | eng | acc ↑ | 0.63 |
2029
+ | crows_pairs_english (Median of 6 prompts) | eng | acc ↑ | 0.497 |
2030
+ | crows_pairs_french (Median of 7 prompts) | fra | acc ↑ | 0.503 |
2031
+ | diabla (Median of 2 prompts) | eng | acc ↑ | 0.289 |
2032
+ | gsarti/flores_101_afr | afr | byte_perplexity ↓ | 6.501 |
2033
+ | gsarti/flores_101_amh | amh | byte_perplexity ↓ | 3.973 |
2034
+ | gsarti/flores_101_ara | ara | byte_perplexity ↓ | 1.808 |
2035
+ | gsarti/flores_101_asm | asm | byte_perplexity ↓ | 5.699 |
2036
+ | gsarti/flores_101_ast | ast | byte_perplexity ↓ | 3.925 |
2037
+ | gsarti/flores_101_azj | azj | byte_perplexity ↓ | 6.943 |
2038
+ | gsarti/flores_101_bel | bel | byte_perplexity ↓ | 3.614 |
2039
+ | gsarti/flores_101_ben | ben | byte_perplexity ↓ | 5.121 |
2040
+ | gsarti/flores_101_bos | bos | byte_perplexity ↓ | 5.653 |
2041
+ | gsarti/flores_101_bul | bul | byte_perplexity ↓ | 2.701 |
2042
+ | gsarti/flores_101_cat | cat | byte_perplexity ↓ | 2.305 |
2043
+ | gsarti/flores_101_ceb | ceb | byte_perplexity ↓ | 6.291 |
2044
+ | gsarti/flores_101_ces | ces | byte_perplexity ↓ | 5.447 |
2045
+ | gsarti/flores_101_ckb | ckb | byte_perplexity ↓ | 3.726 |
2046
+ | gsarti/flores_101_cym | cym | byte_perplexity ↓ | 12.539 |
2047
+ | gsarti/flores_101_dan | dan | byte_perplexity ↓ | 5.183 |
2048
+ | gsarti/flores_101_deu | deu | byte_perplexity ↓ | 3.118 |
2049
+ | gsarti/flores_101_ell | ell | byte_perplexity ↓ | 2.468 |
2050
+ | gsarti/flores_101_eng | eng | byte_perplexity ↓ | 2.019 |
2051
+ | gsarti/flores_101_est | est | byte_perplexity ↓ | 9.117 |
2052
+ | gsarti/flores_101_fas | fas | byte_perplexity ↓ | 3.058 |
2053
+ | gsarti/flores_101_fin | fin | byte_perplexity ↓ | 6.847 |
2054
+ | gsarti/flores_101_fra | fra | byte_perplexity ↓ | 1.998 |
2055
+ | gsarti/flores_101_ful | ful | byte_perplexity ↓ | 11.466 |
2056
+ | gsarti/flores_101_gle | gle | byte_perplexity ↓ | 8.681 |
2057
+ | gsarti/flores_101_glg | glg | byte_perplexity ↓ | 3.03 |
2058
+ | gsarti/flores_101_guj | guj | byte_perplexity ↓ | 4.955 |
2059
+ | gsarti/flores_101_hau | hau | byte_perplexity ↓ | 10.758 |
2060
+ | gsarti/flores_101_heb | heb | byte_perplexity ↓ | 3.6 |
2061
+ | gsarti/flores_101_hin | hin | byte_perplexity ↓ | 4.713 |
2062
+ | gsarti/flores_101_hrv | hrv | byte_perplexity ↓ | 5.822 |
2063
+ | gsarti/flores_101_hun | hun | byte_perplexity ↓ | 6.44 |
2064
+ | gsarti/flores_101_hye | hye | byte_perplexity ↓ | 3.658 |
2065
+ | gsarti/flores_101_ibo | ibo | byte_perplexity ↓ | 5.565 |
2066
+ | gsarti/flores_101_ind | ind | byte_perplexity ↓ | 2.16 |
2067
+ | gsarti/flores_101_isl | isl | byte_perplexity ↓ | 8.082 |
2068
+ | gsarti/flores_101_ita | ita | byte_perplexity ↓ | 2.969 |
2069
+ | gsarti/flores_101_jav | jav | byte_perplexity ↓ | 7.057 |
2070
+ | gsarti/flores_101_jpn | jpn | byte_perplexity ↓ | 2.776 |
2071
+ | gsarti/flores_101_kam | kam | byte_perplexity ↓ | 11.073 |
2072
+ | gsarti/flores_101_kan | kan | byte_perplexity ↓ | 5.552 |
2073
+ | gsarti/flores_101_kat | kat | byte_perplexity ↓ | 2.523 |
2074
+ | gsarti/flores_101_kaz | kaz | byte_perplexity ↓ | 3.39 |
2075
+ | gsarti/flores_101_kea | kea | byte_perplexity ↓ | 8.919 |
2076
+ | gsarti/flores_101_kir | kir | byte_perplexity ↓ | 3.729 |
2077
+ | gsarti/flores_101_kor | kor | byte_perplexity ↓ | 3.933 |
2078
+ | gsarti/flores_101_lao | lao | byte_perplexity ↓ | 2.908 |
2079
+ | gsarti/flores_101_lav | lav | byte_perplexity ↓ | 7.777 |
2080
+ | gsarti/flores_101_lin | lin | byte_perplexity ↓ | 7.525 |
2081
+ | gsarti/flores_101_lit | lit | byte_perplexity ↓ | 7.369 |
2082
+ | gsarti/flores_101_ltz | ltz | byte_perplexity ↓ | 8.801 |
2083
+ | gsarti/flores_101_lug | lug | byte_perplexity ↓ | 8.483 |
2084
+ | gsarti/flores_101_luo | luo | byte_perplexity ↓ | 11.976 |
2085
+ | gsarti/flores_101_mal | mal | byte_perplexity ↓ | 4.616 |
2086
+ | gsarti/flores_101_mar | mar | byte_perplexity ↓ | 5.483 |
2087
+ | gsarti/flores_101_mkd | mkd | byte_perplexity ↓ | 2.966 |
2088
+ | gsarti/flores_101_mlt | mlt | byte_perplexity ↓ | 15.005 |
2089
+ | gsarti/flores_101_mon | mon | byte_perplexity ↓ | 3.411 |
2090
+ | gsarti/flores_101_mri | mri | byte_perplexity ↓ | 7.474 |
2091
+ | gsarti/flores_101_msa | msa | byte_perplexity ↓ | 2.571 |
2092
+ | gsarti/flores_101_mya | mya | byte_perplexity ↓ | 2.414 |
2093
+ | gsarti/flores_101_nld | nld | byte_perplexity ↓ | 4.128 |
2094
+ | gsarti/flores_101_nob | nob | byte_perplexity ↓ | 5.403 |
2095
+ | gsarti/flores_101_npi | npi | byte_perplexity ↓ | 5.199 |
2096
+ | gsarti/flores_101_nso | nso | byte_perplexity ↓ | 8.155 |
2097
+ | gsarti/flores_101_nya | nya | byte_perplexity ↓ | 8.18 |
2098
+ | gsarti/flores_101_oci | oci | byte_perplexity ↓ | 4.862 |
2099
+ | gsarti/flores_101_orm | orm | byte_perplexity ↓ | 12.912 |
2100
+ | gsarti/flores_101_ory | ory | byte_perplexity ↓ | 5.189 |
2101
+ | gsarti/flores_101_pan | pan | byte_perplexity ↓ | 4.698 |
2102
+ | gsarti/flores_101_pol | pol | byte_perplexity ↓ | 4.626 |
2103
+ | gsarti/flores_101_por | por | byte_perplexity ↓ | 1.975 |
2104
+ | gsarti/flores_101_pus | pus | byte_perplexity ↓ | 4.496 |
2105
+ | gsarti/flores_101_ron | ron | byte_perplexity ↓ | 4.965 |
2106
+ | gsarti/flores_101_rus | rus | byte_perplexity ↓ | 2.05 |
2107
+ | gsarti/flores_101_slk | slk | byte_perplexity ↓ | 6.451 |
2108
+ | gsarti/flores_101_slv | slv | byte_perplexity ↓ | 6.62 |
2109
+ | gsarti/flores_101_sna | sna | byte_perplexity ↓ | 8.462 |
2110
+ | gsarti/flores_101_snd | snd | byte_perplexity ↓ | 5.466 |
2111
+ | gsarti/flores_101_som | som | byte_perplexity ↓ | 11.959 |
2112
+ | gsarti/flores_101_spa | spa | byte_perplexity ↓ | 1.897 |
2113
+ | gsarti/flores_101_srp | srp | byte_perplexity ↓ | 2.871 |
2114
+ | gsarti/flores_101_swe | swe | byte_perplexity ↓ | 5.055 |
2115
+ | gsarti/flores_101_swh | swh | byte_perplexity ↓ | 3.697 |
2116
+ | gsarti/flores_101_tam | tam | byte_perplexity ↓ | 4.539 |
2117
+ | gsarti/flores_101_tel | tel | byte_perplexity ↓ | 5.807 |
2118
+ | gsarti/flores_101_tgk | tgk | byte_perplexity ↓ | 3.599 |
2119
+ | gsarti/flores_101_tgl | tgl | byte_perplexity ↓ | 5.667 |
2120
+ | gsarti/flores_101_tha | tha | byte_perplexity ↓ | 2.366 |
2121
+ | gsarti/flores_101_tur | tur | byte_perplexity ↓ | 4.885 |
2122
+ | gsarti/flores_101_ukr | ukr | byte_perplexity ↓ | 2.724 |
2123
+ | gsarti/flores_101_umb | umb | byte_perplexity ↓ | 12.767 |
2124
+ | gsarti/flores_101_urd | urd | byte_perplexity ↓ | 1.98 |
2125
+ | gsarti/flores_101_uzb | uzb | byte_perplexity ↓ | 12.002 |
2126
+ | gsarti/flores_101_vie | vie | byte_perplexity ↓ | 1.766 |
2127
+ | gsarti/flores_101_wol | wol | byte_perplexity ↓ | 9.144 |
2128
+ | gsarti/flores_101_xho | xho | byte_perplexity ↓ | 7.403 |
2129
+ | gsarti/flores_101_yor | yor | byte_perplexity ↓ | 5.913 |
2130
+ | gsarti/flores_101_zho_simpl | zho_simpl | byte_perplexity ↓ | 2.277 |
2131
+ | gsarti/flores_101_zho_trad | zho_trad | byte_perplexity ↓ | 2.518 |
2132
+ | gsarti/flores_101_zul | zul | byte_perplexity ↓ | 8.534 |
2133
+ | headqa | esp | acc ↑ | 0.264 |
2134
+ | hellaswag | eng | acc ↑ | 0.412 |
2135
+ | logiqa | eng | acc ↑ | 0.207 |
2136
+ | mathqa | eng | acc ↑ | 0.25 |
2137
+ | mc_taco | eng | em ↑ | 0.119 |
2138
+ | mnli (Median of 15 prompts) | eng | acc ↑ | 0.355 |
2139
+ | mnli_mismatched (Median of 15 prompts) | eng | acc ↑ | 0.352 |
2140
+ | mrpc | eng | acc ↑ | 0.586 |
2141
+ | multirc (Median of 11 prompts) | eng | acc ↑ | 0.538 |
2142
+ | openbookqa | eng | acc ↑ | 0.216 |
2143
+ | piqa | eng | acc ↑ | 0.708 |
2144
+ | prost | eng | acc ↑ | 0.227 |
2145
+ | pubmedqa | eng | acc ↑ | 0.616 |
2146
+ | qnli | eng | acc ↑ | 0.507 |
2147
+ | qqp (Median of 7 prompts) | eng | acc ↑ | 0.384 |
2148
+ | race | eng | acc ↑ | 0.352 |
2149
+ | rte (Median of 6 prompts) | eng | acc ↑ | 0.477 |
2150
+ | sciq | eng | acc ↑ | 0.892 |
2151
+ | sst (Median of 6 prompts) | eng | acc ↑ | 0.518 |
2152
+ | triviaqa | eng | acc ↑ | 0.042 |
2153
+ | tydiqa_primary (Median of 24 prompts) | eng | acc ↑ | 0.301 |
2154
+ | webqs | eng | acc ↑ | 0.017 |
2155
+ | wic (Median of 11 prompts) | eng | acc ↑ | 0.502 |
2156
+ | winogrande | eng | acc ↑ | 0.586 |
2157
+ | wnli (Median of 6 prompts) | eng | acc ↑ | 0.472 |
2158
+ | wsc (Median of 11 prompts) | eng | acc ↑ | 0.442 |
2159
+ | humaneval | python | pass@1 ↑ | 0.155 |
2160
+ | humaneval | python | pass@10 ↑ | 0.322 |
2161
+ | humaneval | python | pass@100 ↑ | 0.555 |
2162
+
2163
+ **Train-time Evaluation:**
2164
+
2165
+ As of 25.May.2022, 15:00 PST:
2166
+
2167
+ - Training Loss: 2.0
2168
+
2169
+ - Validation Loss: 2.2
2170
+
2171
+ - Perplexity: 8.9
2172
+
2173
+ </details>
2174
+ <p>&nbsp;</p>
2175
+
2176
+ ## Recommendations
2177
+
2178
+ *This section provides information on warnings and potential mitigations.*
2179
+
2180
+
2181
+ <details>
2182
+ <summary>Click to expand</summary><br/>
2183
+
2184
+ - Indirect users should be made aware when the content they're working with is created by the LLM.
2185
+
2186
+ - Users should be aware of [Risks and Limitations](#risks-and-limitations), and include an appropriate age disclaimer or blocking interface as necessary.
2187
+
2188
+ - Models pretrained with the LLM should include an updated Model Card.
2189
+
2190
+ - Users of the model should provide mechanisms for those affected to provide feedback, such as an email address for comments.
2191
+
2192
+ </details>
2193
+ <p>&nbsp;</p>
2194
+
2195
+ ## Glossary and Calculations
2196
+
2197
+ *This section defines common terms and how metrics are calculated.*
2198
+
2199
+
2200
+
2201
+ <details>
2202
+ <summary>Click to expand</summary><br/>
2203
+
2204
+ - <a name="loss">**Loss:**</a> A calculation of the difference between what the model has learned and what the data shows ("groundtruth"). The lower the loss, the better. The training process aims to minimize the loss.
2205
+
2206
+ - <a name="perplexity">**Perplexity:**</a> This is based on what the model estimates the probability of new data is. The lower the perplexity, the better. If the model is 100% correct at predicting the next token it will see, then the perplexity is 1. Mathematically this is calculated using entropy.
2207
+
2208
+ - <a name="high-stakes">**High-stakes settings:**</a> Such as those identified as "high-risk AI systems" and "unacceptable risk AI systems" in the European Union's proposed [Artificial Intelligence (AI) Act](https://artificialintelligenceact.eu/annexes/).
2209
+
2210
+ - <a name="critical-decisions">**Critical decisions:**</a> Such as those defined in [the United States' proposed Algorithmic Accountability Act](https://www.congress.gov/117/bills/s3572/BILLS-117s3572is.pdf).
2211
+
2212
+ - <a name="human-rights">**Human rights:**</a> Includes those rights defined in the [Universal Declaration of Human Rights](https://www.un.org/sites/un2.un.org/files/2021/03/udhr.pdf).
2213
+
2214
+ - <a name="personal-data-and-information">**Personal Data and Personal Information:**</a> Personal data and information is defined in multiple data protection regulations, such as "[personal data](https://gdpr-info.eu/issues/personal-data/)" in the [European Union's General Data Protection Regulation](https://gdpr-info.eu); and "personal information" in the Republic of South Africa's [Protection of Personal Information Act](https://www.gov.za/sites/default/files/gcis_document/201409/3706726-11act4of2013popi.pdf), The People's Republic of China's [Personal information protection law](http://en.npc.gov.cn.cdurl.cn/2021-12/29/c_694559.htm).
2215
+
2216
+ - <a name="sensitive-characteristics">**Sensitive characteristics:**</a> This includes specifically protected categories in human rights (see [UHDR, Article 2](https://www.un.org/sites/un2.un.org/files/2021/03/udhr.pdf)) and personal information regulation (see GDPR, [Article 9; Protection of Personal Information Act, Chapter 1](https://www.gov.za/sites/default/files/gcis_document/201409/3706726-11act4of2013popi.pdf))
2217
+
2218
+ - <a name="deception">**Deception:**</a> Doing something to intentionally mislead individuals to believe something that is false, such as by creating deadbots or chatbots on social media posing as real people, or generating text documents without making consumers aware that the text is machine generated.
2219
+
2220
+ </details>
2221
+ <p>&nbsp;</p>
2222
+
2223
+ ## More Information
2224
+
2225
+ <details>
2226
+ <summary>Click to expand</summary><br/>
2227
+
2228
+ ### Dataset Creation
2229
+
2230
+ Blog post detailing the design choices during the dataset creation: https://bigscience.huggingface.co/blog/building-a-tb-scale-multilingual-dataset-for-language-modeling
2231
+
2232
+ ### Technical Specifications
2233
+
2234
+ Blog post summarizing how the architecture, size, shape, and pre-training duration where selected: https://bigscience.huggingface.co/blog/what-language-model-to-train-if-you-have-two-million-gpu-hours
2235
+
2236
+ More details on the architecture/optimizer: https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml
2237
+
2238
+ Blog post on the hardware/engineering side: https://bigscience.huggingface.co/blog/which-hardware-to-train-a-176b-parameters-model
2239
+
2240
+ Details on the distributed setup used for the training: https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml
2241
+
2242
+ Tensorboard updated during the training: https://huggingface.co/bigscience/tr11-176B-ml-logs/tensorboard#scalars&tagFilter=loss
2243
+
2244
+ Insights on how to approach training, negative results: https://github.com/bigscience-workshop/bigscience/blob/master/train/lessons-learned.md
2245
+
2246
+ Details on the obstacles overcome during the preparation on the engineering side (instabilities, optimization of training throughput, so many technical tricks and questions): https://github.com/bigscience-workshop/bigscience/blob/master/train/tr11-176B-ml/chronicles.md
2247
+
2248
+ ### Initial Results
2249
+
2250
+ Initial prompting experiments using interim checkpoints: https://huggingface.co/spaces/bigscience/bloom-book
2251
+
2252
+ </details>
2253
+ <p>&nbsp;</p>
2254
+
2255
+ ## Model Card Authors
2256
+ *Ordered roughly chronologically and by amount of time spent.*
2257
+
2258
+ Margaret Mitchell, Giada Pistilli, Yacine Jernite, Ezinwanne Ozoani, Marissa Gerchick, Nazneen Rajani, Sasha Luccioni, Irene Solaiman, Maraim Masoud, Somaieh Nikpoor, Carlos Muñoz Ferrandis, Stas Bekman, Christopher Akiki, Danish Contractor, David Lansky, Angelina McMillan-Major, Tristan Thrush, Suzana Ilić, Gérard Dupont, Shayne Longpre, Manan Dey, Stella Biderman, Douwe Kiela, Emi Baylor, Teven Le Scao, Aaron Gokaslan, Julien Launay, Niklas Muennighoff
2259
+
2260
+
2261
+