RichardErkhov commited on
Commit
d224b83
β€’
1 Parent(s): f224e3d

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +2286 -0
README.md ADDED
@@ -0,0 +1,2286 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ bloom-3b - GGUF
11
+ - Model creator: https://huggingface.co/bigscience/
12
+ - Original model: https://huggingface.co/bigscience/bloom-3b/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [bloom-3b.Q2_K.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.Q2_K.gguf) | Q2_K | 1.52GB |
18
+ | [bloom-3b.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.IQ3_XS.gguf) | IQ3_XS | 1.68GB |
19
+ | [bloom-3b.IQ3_S.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.IQ3_S.gguf) | IQ3_S | 1.71GB |
20
+ | [bloom-3b.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.Q3_K_S.gguf) | Q3_K_S | 1.71GB |
21
+ | [bloom-3b.IQ3_M.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.IQ3_M.gguf) | IQ3_M | 1.81GB |
22
+ | [bloom-3b.Q3_K.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.Q3_K.gguf) | Q3_K | 1.9GB |
23
+ | [bloom-3b.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.Q3_K_M.gguf) | Q3_K_M | 1.9GB |
24
+ | [bloom-3b.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.Q3_K_L.gguf) | Q3_K_L | 2.02GB |
25
+ | [bloom-3b.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.IQ4_XS.gguf) | IQ4_XS | 2.0GB |
26
+ | [bloom-3b.Q4_0.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.Q4_0.gguf) | Q4_0 | 2.08GB |
27
+ | [bloom-3b.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.IQ4_NL.gguf) | IQ4_NL | 2.09GB |
28
+ | [bloom-3b.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.Q4_K_S.gguf) | Q4_K_S | 2.09GB |
29
+ | [bloom-3b.Q4_K.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.Q4_K.gguf) | Q4_K | 2.24GB |
30
+ | [bloom-3b.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.Q4_K_M.gguf) | Q4_K_M | 2.24GB |
31
+ | [bloom-3b.Q4_1.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.Q4_1.gguf) | Q4_1 | 2.25GB |
32
+ | [bloom-3b.Q5_0.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.Q5_0.gguf) | Q5_0 | 2.43GB |
33
+ | [bloom-3b.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.Q5_K_S.gguf) | Q5_K_S | 2.43GB |
34
+ | [bloom-3b.Q5_K.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.Q5_K.gguf) | Q5_K | 2.55GB |
35
+ | [bloom-3b.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.Q5_K_M.gguf) | Q5_K_M | 1.64GB |
36
+ | [bloom-3b.Q5_1.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.Q5_1.gguf) | Q5_1 | 1.58GB |
37
+ | [bloom-3b.Q6_K.gguf](https://huggingface.co/RichardErkhov/bigscience_-_bloom-3b-gguf/blob/main/bloom-3b.Q6_K.gguf) | Q6_K | 1.31GB |
38
+
39
+
40
+
41
+
42
+ Original model description:
43
+ ---
44
+ license: bigscience-bloom-rail-1.0
45
+ language:
46
+ - ak
47
+ - ar
48
+ - as
49
+ - bm
50
+ - bn
51
+ - ca
52
+ - code
53
+ - en
54
+ - es
55
+ - eu
56
+ - fon
57
+ - fr
58
+ - gu
59
+ - hi
60
+ - id
61
+ - ig
62
+ - ki
63
+ - kn
64
+ - lg
65
+ - ln
66
+ - ml
67
+ - mr
68
+ - ne
69
+ - nso
70
+ - ny
71
+ - or
72
+ - pa
73
+ - pt
74
+ - rn
75
+ - rw
76
+ - sn
77
+ - st
78
+ - sw
79
+ - ta
80
+ - te
81
+ - tn
82
+ - ts
83
+ - tum
84
+ - tw
85
+ - ur
86
+ - vi
87
+ - wo
88
+ - xh
89
+ - yo
90
+ - zh
91
+ - zhs
92
+ - zht
93
+ - zu
94
+ pipeline_tag: text-generation
95
+ model-index:
96
+ - name: bloom
97
+ results:
98
+ - task:
99
+ type: text-generation
100
+ name: text generation
101
+ dataset:
102
+ name: arc_challenge
103
+ type: arc_challenge
104
+ metrics:
105
+ - name: acc
106
+ type: acc
107
+ value: 0.27986348122866894
108
+ verified: false
109
+ - task:
110
+ type: text-generation
111
+ name: text generation
112
+ dataset:
113
+ name: arc_easy
114
+ type: arc_easy
115
+ metrics:
116
+ - name: acc
117
+ type: acc
118
+ value: 0.5946969696969697
119
+ verified: false
120
+ - task:
121
+ type: text-generation
122
+ name: text generation
123
+ dataset:
124
+ name: axb
125
+ type: axb
126
+ metrics:
127
+ - name: acc
128
+ type: acc
129
+ value: 0.4433876811594203
130
+ verified: false
131
+ - task:
132
+ type: text-generation
133
+ name: text generation
134
+ dataset:
135
+ name: axg
136
+ type: axg
137
+ metrics:
138
+ - name: acc
139
+ type: acc
140
+ value: 0.5
141
+ verified: false
142
+ - task:
143
+ type: text-generation
144
+ name: text generation
145
+ dataset:
146
+ name: boolq
147
+ type: boolq
148
+ metrics:
149
+ - name: acc
150
+ type: acc
151
+ value: 0.6165137614678899
152
+ verified: false
153
+ - task:
154
+ type: text-generation
155
+ name: text generation
156
+ dataset:
157
+ name: cb
158
+ type: cb
159
+ metrics:
160
+ - name: acc
161
+ type: acc
162
+ value: 0.30357142857142855
163
+ verified: false
164
+ - task:
165
+ type: text-generation
166
+ name: text generation
167
+ dataset:
168
+ name: cola
169
+ type: cola
170
+ metrics:
171
+ - name: acc
172
+ type: acc
173
+ value: 0.610738255033557
174
+ verified: false
175
+ - task:
176
+ type: text-generation
177
+ name: text generation
178
+ dataset:
179
+ name: copa
180
+ type: copa
181
+ metrics:
182
+ - name: acc
183
+ type: acc
184
+ value: 0.63
185
+ verified: false
186
+ - task:
187
+ type: text-generation
188
+ name: text generation
189
+ dataset:
190
+ name: crows_pairs_english
191
+ type: crows_pairs_english
192
+ metrics:
193
+ - name: acc
194
+ type: acc
195
+ value: 0.4973166368515206
196
+ verified: false
197
+ - task:
198
+ type: text-generation
199
+ name: text generation
200
+ dataset:
201
+ name: crows_pairs_french
202
+ type: crows_pairs_french
203
+ metrics:
204
+ - name: acc
205
+ type: acc
206
+ value: 0.5032796660703638
207
+ verified: false
208
+ - task:
209
+ type: text-generation
210
+ name: text generation
211
+ dataset:
212
+ name: diabla
213
+ type: diabla
214
+ metrics:
215
+ - name: acc
216
+ type: acc
217
+ value: 0.28888308977035493
218
+ verified: false
219
+ - task:
220
+ type: text-generation
221
+ name: text generation
222
+ dataset:
223
+ name: gsarti/flores_101_afr
224
+ type: gsarti/flores_101_afr
225
+ metrics:
226
+ - name: byte_perplexity
227
+ type: byte_perplexity
228
+ value: 6.500798737976343
229
+ verified: false
230
+ - task:
231
+ type: text-generation
232
+ name: text generation
233
+ dataset:
234
+ name: gsarti/flores_101_amh
235
+ type: gsarti/flores_101_amh
236
+ metrics:
237
+ - name: byte_perplexity
238
+ type: byte_perplexity
239
+ value: 3.9726863338897145
240
+ verified: false
241
+ - task:
242
+ type: text-generation
243
+ name: text generation
244
+ dataset:
245
+ name: gsarti/flores_101_ara
246
+ type: gsarti/flores_101_ara
247
+ metrics:
248
+ - name: byte_perplexity
249
+ type: byte_perplexity
250
+ value: 1.8083841089875814
251
+ verified: false
252
+ - task:
253
+ type: text-generation
254
+ name: text generation
255
+ dataset:
256
+ name: gsarti/flores_101_asm
257
+ type: gsarti/flores_101_asm
258
+ metrics:
259
+ - name: byte_perplexity
260
+ type: byte_perplexity
261
+ value: 5.699102962086425
262
+ verified: false
263
+ - task:
264
+ type: text-generation
265
+ name: text generation
266
+ dataset:
267
+ name: gsarti/flores_101_ast
268
+ type: gsarti/flores_101_ast
269
+ metrics:
270
+ - name: byte_perplexity
271
+ type: byte_perplexity
272
+ value: 3.9252047073429384
273
+ verified: false
274
+ - task:
275
+ type: text-generation
276
+ name: text generation
277
+ dataset:
278
+ name: gsarti/flores_101_azj
279
+ type: gsarti/flores_101_azj
280
+ metrics:
281
+ - name: byte_perplexity
282
+ type: byte_perplexity
283
+ value: 6.942805054270002
284
+ verified: false
285
+ - task:
286
+ type: text-generation
287
+ name: text generation
288
+ dataset:
289
+ name: gsarti/flores_101_bel
290
+ type: gsarti/flores_101_bel
291
+ metrics:
292
+ - name: byte_perplexity
293
+ type: byte_perplexity
294
+ value: 3.614136245847082
295
+ verified: false
296
+ - task:
297
+ type: text-generation
298
+ name: text generation
299
+ dataset:
300
+ name: gsarti/flores_101_ben
301
+ type: gsarti/flores_101_ben
302
+ metrics:
303
+ - name: byte_perplexity
304
+ type: byte_perplexity
305
+ value: 5.121491534300969
306
+ verified: false
307
+ - task:
308
+ type: text-generation
309
+ name: text generation
310
+ dataset:
311
+ name: gsarti/flores_101_bos
312
+ type: gsarti/flores_101_bos
313
+ metrics:
314
+ - name: byte_perplexity
315
+ type: byte_perplexity
316
+ value: 5.653353469118798
317
+ verified: false
318
+ - task:
319
+ type: text-generation
320
+ name: text generation
321
+ dataset:
322
+ name: gsarti/flores_101_bul
323
+ type: gsarti/flores_101_bul
324
+ metrics:
325
+ - name: byte_perplexity
326
+ type: byte_perplexity
327
+ value: 2.7014693938055068
328
+ verified: false
329
+ - task:
330
+ type: text-generation
331
+ name: text generation
332
+ dataset:
333
+ name: gsarti/flores_101_cat
334
+ type: gsarti/flores_101_cat
335
+ metrics:
336
+ - name: byte_perplexity
337
+ type: byte_perplexity
338
+ value: 2.305190041967345
339
+ verified: false
340
+ - task:
341
+ type: text-generation
342
+ name: text generation
343
+ dataset:
344
+ name: gsarti/flores_101_ceb
345
+ type: gsarti/flores_101_ceb
346
+ metrics:
347
+ - name: byte_perplexity
348
+ type: byte_perplexity
349
+ value: 6.291000321323428
350
+ verified: false
351
+ - task:
352
+ type: text-generation
353
+ name: text generation
354
+ dataset:
355
+ name: gsarti/flores_101_ces
356
+ type: gsarti/flores_101_ces
357
+ metrics:
358
+ - name: byte_perplexity
359
+ type: byte_perplexity
360
+ value: 5.447322753586386
361
+ verified: false
362
+ - task:
363
+ type: text-generation
364
+ name: text generation
365
+ dataset:
366
+ name: gsarti/flores_101_ckb
367
+ type: gsarti/flores_101_ckb
368
+ metrics:
369
+ - name: byte_perplexity
370
+ type: byte_perplexity
371
+ value: 3.7255124939234765
372
+ verified: false
373
+ - task:
374
+ type: text-generation
375
+ name: text generation
376
+ dataset:
377
+ name: gsarti/flores_101_cym
378
+ type: gsarti/flores_101_cym
379
+ metrics:
380
+ - name: byte_perplexity
381
+ type: byte_perplexity
382
+ value: 12.539424151448149
383
+ verified: false
384
+ - task:
385
+ type: text-generation
386
+ name: text generation
387
+ dataset:
388
+ name: gsarti/flores_101_dan
389
+ type: gsarti/flores_101_dan
390
+ metrics:
391
+ - name: byte_perplexity
392
+ type: byte_perplexity
393
+ value: 5.183309001005672
394
+ verified: false
395
+ - task:
396
+ type: text-generation
397
+ name: text generation
398
+ dataset:
399
+ name: gsarti/flores_101_deu
400
+ type: gsarti/flores_101_deu
401
+ metrics:
402
+ - name: byte_perplexity
403
+ type: byte_perplexity
404
+ value: 3.1180422286591347
405
+ verified: false
406
+ - task:
407
+ type: text-generation
408
+ name: text generation
409
+ dataset:
410
+ name: gsarti/flores_101_ell
411
+ type: gsarti/flores_101_ell
412
+ metrics:
413
+ - name: byte_perplexity
414
+ type: byte_perplexity
415
+ value: 2.467943456164706
416
+ verified: false
417
+ - task:
418
+ type: text-generation
419
+ name: text generation
420
+ dataset:
421
+ name: gsarti/flores_101_eng
422
+ type: gsarti/flores_101_eng
423
+ metrics:
424
+ - name: byte_perplexity
425
+ type: byte_perplexity
426
+ value: 2.018740628193298
427
+ verified: false
428
+ - task:
429
+ type: text-generation
430
+ name: text generation
431
+ dataset:
432
+ name: gsarti/flores_101_est
433
+ type: gsarti/flores_101_est
434
+ metrics:
435
+ - name: byte_perplexity
436
+ type: byte_perplexity
437
+ value: 9.11654425176368
438
+ verified: false
439
+ - task:
440
+ type: text-generation
441
+ name: text generation
442
+ dataset:
443
+ name: gsarti/flores_101_fas
444
+ type: gsarti/flores_101_fas
445
+ metrics:
446
+ - name: byte_perplexity
447
+ type: byte_perplexity
448
+ value: 3.058009097116482
449
+ verified: false
450
+ - task:
451
+ type: text-generation
452
+ name: text generation
453
+ dataset:
454
+ name: gsarti/flores_101_fin
455
+ type: gsarti/flores_101_fin
456
+ metrics:
457
+ - name: byte_perplexity
458
+ type: byte_perplexity
459
+ value: 6.847047959628553
460
+ verified: false
461
+ - task:
462
+ type: text-generation
463
+ name: text generation
464
+ dataset:
465
+ name: gsarti/flores_101_fra
466
+ type: gsarti/flores_101_fra
467
+ metrics:
468
+ - name: byte_perplexity
469
+ type: byte_perplexity
470
+ value: 1.9975177011840075
471
+ verified: false
472
+ - task:
473
+ type: text-generation
474
+ name: text generation
475
+ dataset:
476
+ name: gsarti/flores_101_ful
477
+ type: gsarti/flores_101_ful
478
+ metrics:
479
+ - name: byte_perplexity
480
+ type: byte_perplexity
481
+ value: 11.465912731488828
482
+ verified: false
483
+ - task:
484
+ type: text-generation
485
+ name: text generation
486
+ dataset:
487
+ name: gsarti/flores_101_gle
488
+ type: gsarti/flores_101_gle
489
+ metrics:
490
+ - name: byte_perplexity
491
+ type: byte_perplexity
492
+ value: 8.681491663539422
493
+ verified: false
494
+ - task:
495
+ type: text-generation
496
+ name: text generation
497
+ dataset:
498
+ name: gsarti/flores_101_glg
499
+ type: gsarti/flores_101_glg
500
+ metrics:
501
+ - name: byte_perplexity
502
+ type: byte_perplexity
503
+ value: 3.029991089015508
504
+ verified: false
505
+ - task:
506
+ type: text-generation
507
+ name: text generation
508
+ dataset:
509
+ name: gsarti/flores_101_guj
510
+ type: gsarti/flores_101_guj
511
+ metrics:
512
+ - name: byte_perplexity
513
+ type: byte_perplexity
514
+ value: 4.955224230286231
515
+ verified: false
516
+ - task:
517
+ type: text-generation
518
+ name: text generation
519
+ dataset:
520
+ name: gsarti/flores_101_hau
521
+ type: gsarti/flores_101_hau
522
+ metrics:
523
+ - name: byte_perplexity
524
+ type: byte_perplexity
525
+ value: 10.758347356372159
526
+ verified: false
527
+ - task:
528
+ type: text-generation
529
+ name: text generation
530
+ dataset:
531
+ name: gsarti/flores_101_heb
532
+ type: gsarti/flores_101_heb
533
+ metrics:
534
+ - name: byte_perplexity
535
+ type: byte_perplexity
536
+ value: 3.6004478129801667
537
+ verified: false
538
+ - task:
539
+ type: text-generation
540
+ name: text generation
541
+ dataset:
542
+ name: gsarti/flores_101_hin
543
+ type: gsarti/flores_101_hin
544
+ metrics:
545
+ - name: byte_perplexity
546
+ type: byte_perplexity
547
+ value: 4.712530650588064
548
+ verified: false
549
+ - task:
550
+ type: text-generation
551
+ name: text generation
552
+ dataset:
553
+ name: gsarti/flores_101_hrv
554
+ type: gsarti/flores_101_hrv
555
+ metrics:
556
+ - name: byte_perplexity
557
+ type: byte_perplexity
558
+ value: 5.822418943372185
559
+ verified: false
560
+ - task:
561
+ type: text-generation
562
+ name: text generation
563
+ dataset:
564
+ name: gsarti/flores_101_hun
565
+ type: gsarti/flores_101_hun
566
+ metrics:
567
+ - name: byte_perplexity
568
+ type: byte_perplexity
569
+ value: 6.440482646965992
570
+ verified: false
571
+ - task:
572
+ type: text-generation
573
+ name: text generation
574
+ dataset:
575
+ name: gsarti/flores_101_hye
576
+ type: gsarti/flores_101_hye
577
+ metrics:
578
+ - name: byte_perplexity
579
+ type: byte_perplexity
580
+ value: 3.657718918347166
581
+ verified: false
582
+ - task:
583
+ type: text-generation
584
+ name: text generation
585
+ dataset:
586
+ name: gsarti/flores_101_ibo
587
+ type: gsarti/flores_101_ibo
588
+ metrics:
589
+ - name: byte_perplexity
590
+ type: byte_perplexity
591
+ value: 5.564814003872672
592
+ verified: false
593
+ - task:
594
+ type: text-generation
595
+ name: text generation
596
+ dataset:
597
+ name: gsarti/flores_101_ind
598
+ type: gsarti/flores_101_ind
599
+ metrics:
600
+ - name: byte_perplexity
601
+ type: byte_perplexity
602
+ value: 2.1597101468869373
603
+ verified: false
604
+ - task:
605
+ type: text-generation
606
+ name: text generation
607
+ dataset:
608
+ name: gsarti/flores_101_isl
609
+ type: gsarti/flores_101_isl
610
+ metrics:
611
+ - name: byte_perplexity
612
+ type: byte_perplexity
613
+ value: 8.082349269518136
614
+ verified: false
615
+ - task:
616
+ type: text-generation
617
+ name: text generation
618
+ dataset:
619
+ name: gsarti/flores_101_ita
620
+ type: gsarti/flores_101_ita
621
+ metrics:
622
+ - name: byte_perplexity
623
+ type: byte_perplexity
624
+ value: 2.9687591414176207
625
+ verified: false
626
+ - task:
627
+ type: text-generation
628
+ name: text generation
629
+ dataset:
630
+ name: gsarti/flores_101_jav
631
+ type: gsarti/flores_101_jav
632
+ metrics:
633
+ - name: byte_perplexity
634
+ type: byte_perplexity
635
+ value: 7.0573805415708994
636
+ verified: false
637
+ - task:
638
+ type: text-generation
639
+ name: text generation
640
+ dataset:
641
+ name: gsarti/flores_101_jpn
642
+ type: gsarti/flores_101_jpn
643
+ metrics:
644
+ - name: byte_perplexity
645
+ type: byte_perplexity
646
+ value: 2.7758864197116933
647
+ verified: false
648
+ - task:
649
+ type: text-generation
650
+ name: text generation
651
+ dataset:
652
+ name: gsarti/flores_101_kam
653
+ type: gsarti/flores_101_kam
654
+ metrics:
655
+ - name: byte_perplexity
656
+ type: byte_perplexity
657
+ value: 11.072949642861332
658
+ verified: false
659
+ - task:
660
+ type: text-generation
661
+ name: text generation
662
+ dataset:
663
+ name: gsarti/flores_101_kan
664
+ type: gsarti/flores_101_kan
665
+ metrics:
666
+ - name: byte_perplexity
667
+ type: byte_perplexity
668
+ value: 5.551730651007082
669
+ verified: false
670
+ - task:
671
+ type: text-generation
672
+ name: text generation
673
+ dataset:
674
+ name: gsarti/flores_101_kat
675
+ type: gsarti/flores_101_kat
676
+ metrics:
677
+ - name: byte_perplexity
678
+ type: byte_perplexity
679
+ value: 2.522630524283745
680
+ verified: false
681
+ - task:
682
+ type: text-generation
683
+ name: text generation
684
+ dataset:
685
+ name: gsarti/flores_101_kaz
686
+ type: gsarti/flores_101_kaz
687
+ metrics:
688
+ - name: byte_perplexity
689
+ type: byte_perplexity
690
+ value: 3.3901748516975574
691
+ verified: false
692
+ - task:
693
+ type: text-generation
694
+ name: text generation
695
+ dataset:
696
+ name: gsarti/flores_101_kea
697
+ type: gsarti/flores_101_kea
698
+ metrics:
699
+ - name: byte_perplexity
700
+ type: byte_perplexity
701
+ value: 8.918534182590863
702
+ verified: false
703
+ - task:
704
+ type: text-generation
705
+ name: text generation
706
+ dataset:
707
+ name: gsarti/flores_101_kir
708
+ type: gsarti/flores_101_kir
709
+ metrics:
710
+ - name: byte_perplexity
711
+ type: byte_perplexity
712
+ value: 3.729278369847201
713
+ verified: false
714
+ - task:
715
+ type: text-generation
716
+ name: text generation
717
+ dataset:
718
+ name: gsarti/flores_101_kor
719
+ type: gsarti/flores_101_kor
720
+ metrics:
721
+ - name: byte_perplexity
722
+ type: byte_perplexity
723
+ value: 3.932884847226212
724
+ verified: false
725
+ - task:
726
+ type: text-generation
727
+ name: text generation
728
+ dataset:
729
+ name: gsarti/flores_101_lao
730
+ type: gsarti/flores_101_lao
731
+ metrics:
732
+ - name: byte_perplexity
733
+ type: byte_perplexity
734
+ value: 2.9077314760849924
735
+ verified: false
736
+ - task:
737
+ type: text-generation
738
+ name: text generation
739
+ dataset:
740
+ name: gsarti/flores_101_lav
741
+ type: gsarti/flores_101_lav
742
+ metrics:
743
+ - name: byte_perplexity
744
+ type: byte_perplexity
745
+ value: 7.777221919194806
746
+ verified: false
747
+ - task:
748
+ type: text-generation
749
+ name: text generation
750
+ dataset:
751
+ name: gsarti/flores_101_lin
752
+ type: gsarti/flores_101_lin
753
+ metrics:
754
+ - name: byte_perplexity
755
+ type: byte_perplexity
756
+ value: 7.524842908050988
757
+ verified: false
758
+ - task:
759
+ type: text-generation
760
+ name: text generation
761
+ dataset:
762
+ name: gsarti/flores_101_lit
763
+ type: gsarti/flores_101_lit
764
+ metrics:
765
+ - name: byte_perplexity
766
+ type: byte_perplexity
767
+ value: 7.369179434621725
768
+ verified: false
769
+ - task:
770
+ type: text-generation
771
+ name: text generation
772
+ dataset:
773
+ name: gsarti/flores_101_ltz
774
+ type: gsarti/flores_101_ltz
775
+ metrics:
776
+ - name: byte_perplexity
777
+ type: byte_perplexity
778
+ value: 8.801059747949214
779
+ verified: false
780
+ - task:
781
+ type: text-generation
782
+ name: text generation
783
+ dataset:
784
+ name: gsarti/flores_101_lug
785
+ type: gsarti/flores_101_lug
786
+ metrics:
787
+ - name: byte_perplexity
788
+ type: byte_perplexity
789
+ value: 8.483203026364786
790
+ verified: false
791
+ - task:
792
+ type: text-generation
793
+ name: text generation
794
+ dataset:
795
+ name: gsarti/flores_101_luo
796
+ type: gsarti/flores_101_luo
797
+ metrics:
798
+ - name: byte_perplexity
799
+ type: byte_perplexity
800
+ value: 11.975963093623681
801
+ verified: false
802
+ - task:
803
+ type: text-generation
804
+ name: text generation
805
+ dataset:
806
+ name: gsarti/flores_101_mal
807
+ type: gsarti/flores_101_mal
808
+ metrics:
809
+ - name: byte_perplexity
810
+ type: byte_perplexity
811
+ value: 4.615948455160037
812
+ verified: false
813
+ - task:
814
+ type: text-generation
815
+ name: text generation
816
+ dataset:
817
+ name: gsarti/flores_101_mar
818
+ type: gsarti/flores_101_mar
819
+ metrics:
820
+ - name: byte_perplexity
821
+ type: byte_perplexity
822
+ value: 5.483253482821379
823
+ verified: false
824
+ - task:
825
+ type: text-generation
826
+ name: text generation
827
+ dataset:
828
+ name: gsarti/flores_101_mkd
829
+ type: gsarti/flores_101_mkd
830
+ metrics:
831
+ - name: byte_perplexity
832
+ type: byte_perplexity
833
+ value: 2.9656732291754087
834
+ verified: false
835
+ - task:
836
+ type: text-generation
837
+ name: text generation
838
+ dataset:
839
+ name: gsarti/flores_101_mlt
840
+ type: gsarti/flores_101_mlt
841
+ metrics:
842
+ - name: byte_perplexity
843
+ type: byte_perplexity
844
+ value: 15.004773437665275
845
+ verified: false
846
+ - task:
847
+ type: text-generation
848
+ name: text generation
849
+ dataset:
850
+ name: gsarti/flores_101_mon
851
+ type: gsarti/flores_101_mon
852
+ metrics:
853
+ - name: byte_perplexity
854
+ type: byte_perplexity
855
+ value: 3.410598542315402
856
+ verified: false
857
+ - task:
858
+ type: text-generation
859
+ name: text generation
860
+ dataset:
861
+ name: gsarti/flores_101_mri
862
+ type: gsarti/flores_101_mri
863
+ metrics:
864
+ - name: byte_perplexity
865
+ type: byte_perplexity
866
+ value: 7.474035895661322
867
+ verified: false
868
+ - task:
869
+ type: text-generation
870
+ name: text generation
871
+ dataset:
872
+ name: gsarti/flores_101_msa
873
+ type: gsarti/flores_101_msa
874
+ metrics:
875
+ - name: byte_perplexity
876
+ type: byte_perplexity
877
+ value: 2.5710001772665634
878
+ verified: false
879
+ - task:
880
+ type: text-generation
881
+ name: text generation
882
+ dataset:
883
+ name: gsarti/flores_101_mya
884
+ type: gsarti/flores_101_mya
885
+ metrics:
886
+ - name: byte_perplexity
887
+ type: byte_perplexity
888
+ value: 2.413577969878331
889
+ verified: false
890
+ - task:
891
+ type: text-generation
892
+ name: text generation
893
+ dataset:
894
+ name: gsarti/flores_101_nld
895
+ type: gsarti/flores_101_nld
896
+ metrics:
897
+ - name: byte_perplexity
898
+ type: byte_perplexity
899
+ value: 4.127831721885065
900
+ verified: false
901
+ - task:
902
+ type: text-generation
903
+ name: text generation
904
+ dataset:
905
+ name: gsarti/flores_101_nob
906
+ type: gsarti/flores_101_nob
907
+ metrics:
908
+ - name: byte_perplexity
909
+ type: byte_perplexity
910
+ value: 5.402763169129877
911
+ verified: false
912
+ - task:
913
+ type: text-generation
914
+ name: text generation
915
+ dataset:
916
+ name: gsarti/flores_101_npi
917
+ type: gsarti/flores_101_npi
918
+ metrics:
919
+ - name: byte_perplexity
920
+ type: byte_perplexity
921
+ value: 5.199342701937889
922
+ verified: false
923
+ - task:
924
+ type: text-generation
925
+ name: text generation
926
+ dataset:
927
+ name: gsarti/flores_101_nso
928
+ type: gsarti/flores_101_nso
929
+ metrics:
930
+ - name: byte_perplexity
931
+ type: byte_perplexity
932
+ value: 8.154626800955667
933
+ verified: false
934
+ - task:
935
+ type: text-generation
936
+ name: text generation
937
+ dataset:
938
+ name: gsarti/flores_101_nya
939
+ type: gsarti/flores_101_nya
940
+ metrics:
941
+ - name: byte_perplexity
942
+ type: byte_perplexity
943
+ value: 8.179860208369393
944
+ verified: false
945
+ - task:
946
+ type: text-generation
947
+ name: text generation
948
+ dataset:
949
+ name: gsarti/flores_101_oci
950
+ type: gsarti/flores_101_oci
951
+ metrics:
952
+ - name: byte_perplexity
953
+ type: byte_perplexity
954
+ value: 4.8617357393685845
955
+ verified: false
956
+ - task:
957
+ type: text-generation
958
+ name: text generation
959
+ dataset:
960
+ name: gsarti/flores_101_orm
961
+ type: gsarti/flores_101_orm
962
+ metrics:
963
+ - name: byte_perplexity
964
+ type: byte_perplexity
965
+ value: 12.911595421079408
966
+ verified: false
967
+ - task:
968
+ type: text-generation
969
+ name: text generation
970
+ dataset:
971
+ name: gsarti/flores_101_ory
972
+ type: gsarti/flores_101_ory
973
+ metrics:
974
+ - name: byte_perplexity
975
+ type: byte_perplexity
976
+ value: 5.189421861225964
977
+ verified: false
978
+ - task:
979
+ type: text-generation
980
+ name: text generation
981
+ dataset:
982
+ name: gsarti/flores_101_pan
983
+ type: gsarti/flores_101_pan
984
+ metrics:
985
+ - name: byte_perplexity
986
+ type: byte_perplexity
987
+ value: 4.698477289331806
988
+ verified: false
989
+ - task:
990
+ type: text-generation
991
+ name: text generation
992
+ dataset:
993
+ name: gsarti/flores_101_pol
994
+ type: gsarti/flores_101_pol
995
+ metrics:
996
+ - name: byte_perplexity
997
+ type: byte_perplexity
998
+ value: 4.625550458479643
999
+ verified: false
1000
+ - task:
1001
+ type: text-generation
1002
+ name: text generation
1003
+ dataset:
1004
+ name: gsarti/flores_101_por
1005
+ type: gsarti/flores_101_por
1006
+ metrics:
1007
+ - name: byte_perplexity
1008
+ type: byte_perplexity
1009
+ value: 1.9754515986213523
1010
+ verified: false
1011
+ - task:
1012
+ type: text-generation
1013
+ name: text generation
1014
+ dataset:
1015
+ name: gsarti/flores_101_pus
1016
+ type: gsarti/flores_101_pus
1017
+ metrics:
1018
+ - name: byte_perplexity
1019
+ type: byte_perplexity
1020
+ value: 4.4963371422771585
1021
+ verified: false
1022
+ - task:
1023
+ type: text-generation
1024
+ name: text generation
1025
+ dataset:
1026
+ name: gsarti/flores_101_ron
1027
+ type: gsarti/flores_101_ron
1028
+ metrics:
1029
+ - name: byte_perplexity
1030
+ type: byte_perplexity
1031
+ value: 4.965456830031304
1032
+ verified: false
1033
+ - task:
1034
+ type: text-generation
1035
+ name: text generation
1036
+ dataset:
1037
+ name: gsarti/flores_101_rus
1038
+ type: gsarti/flores_101_rus
1039
+ metrics:
1040
+ - name: byte_perplexity
1041
+ type: byte_perplexity
1042
+ value: 2.0498020542445303
1043
+ verified: false
1044
+ - task:
1045
+ type: text-generation
1046
+ name: text generation
1047
+ dataset:
1048
+ name: gsarti/flores_101_slk
1049
+ type: gsarti/flores_101_slk
1050
+ metrics:
1051
+ - name: byte_perplexity
1052
+ type: byte_perplexity
1053
+ value: 6.450822127057479
1054
+ verified: false
1055
+ - task:
1056
+ type: text-generation
1057
+ name: text generation
1058
+ dataset:
1059
+ name: gsarti/flores_101_slv
1060
+ type: gsarti/flores_101_slv
1061
+ metrics:
1062
+ - name: byte_perplexity
1063
+ type: byte_perplexity
1064
+ value: 6.620252120186232
1065
+ verified: false
1066
+ - task:
1067
+ type: text-generation
1068
+ name: text generation
1069
+ dataset:
1070
+ name: gsarti/flores_101_sna
1071
+ type: gsarti/flores_101_sna
1072
+ metrics:
1073
+ - name: byte_perplexity
1074
+ type: byte_perplexity
1075
+ value: 8.462166771382726
1076
+ verified: false
1077
+ - task:
1078
+ type: text-generation
1079
+ name: text generation
1080
+ dataset:
1081
+ name: gsarti/flores_101_snd
1082
+ type: gsarti/flores_101_snd
1083
+ metrics:
1084
+ - name: byte_perplexity
1085
+ type: byte_perplexity
1086
+ value: 5.466066951221973
1087
+ verified: false
1088
+ - task:
1089
+ type: text-generation
1090
+ name: text generation
1091
+ dataset:
1092
+ name: gsarti/flores_101_som
1093
+ type: gsarti/flores_101_som
1094
+ metrics:
1095
+ - name: byte_perplexity
1096
+ type: byte_perplexity
1097
+ value: 11.95918054093392
1098
+ verified: false
1099
+ - task:
1100
+ type: text-generation
1101
+ name: text generation
1102
+ dataset:
1103
+ name: gsarti/flores_101_spa
1104
+ type: gsarti/flores_101_spa
1105
+ metrics:
1106
+ - name: byte_perplexity
1107
+ type: byte_perplexity
1108
+ value: 1.8965140104323535
1109
+ verified: false
1110
+ - task:
1111
+ type: text-generation
1112
+ name: text generation
1113
+ dataset:
1114
+ name: gsarti/flores_101_srp
1115
+ type: gsarti/flores_101_srp
1116
+ metrics:
1117
+ - name: byte_perplexity
1118
+ type: byte_perplexity
1119
+ value: 2.871214785885079
1120
+ verified: false
1121
+ - task:
1122
+ type: text-generation
1123
+ name: text generation
1124
+ dataset:
1125
+ name: gsarti/flores_101_swe
1126
+ type: gsarti/flores_101_swe
1127
+ metrics:
1128
+ - name: byte_perplexity
1129
+ type: byte_perplexity
1130
+ value: 5.054972008155866
1131
+ verified: false
1132
+ - task:
1133
+ type: text-generation
1134
+ name: text generation
1135
+ dataset:
1136
+ name: gsarti/flores_101_swh
1137
+ type: gsarti/flores_101_swh
1138
+ metrics:
1139
+ - name: byte_perplexity
1140
+ type: byte_perplexity
1141
+ value: 3.6973091886730676
1142
+ verified: false
1143
+ - task:
1144
+ type: text-generation
1145
+ name: text generation
1146
+ dataset:
1147
+ name: gsarti/flores_101_tam
1148
+ type: gsarti/flores_101_tam
1149
+ metrics:
1150
+ - name: byte_perplexity
1151
+ type: byte_perplexity
1152
+ value: 4.539493400469833
1153
+ verified: false
1154
+ - task:
1155
+ type: text-generation
1156
+ name: text generation
1157
+ dataset:
1158
+ name: gsarti/flores_101_tel
1159
+ type: gsarti/flores_101_tel
1160
+ metrics:
1161
+ - name: byte_perplexity
1162
+ type: byte_perplexity
1163
+ value: 5.807499987508966
1164
+ verified: false
1165
+ - task:
1166
+ type: text-generation
1167
+ name: text generation
1168
+ dataset:
1169
+ name: gsarti/flores_101_tgk
1170
+ type: gsarti/flores_101_tgk
1171
+ metrics:
1172
+ - name: byte_perplexity
1173
+ type: byte_perplexity
1174
+ value: 3.5994818827380426
1175
+ verified: false
1176
+ - task:
1177
+ type: text-generation
1178
+ name: text generation
1179
+ dataset:
1180
+ name: gsarti/flores_101_tgl
1181
+ type: gsarti/flores_101_tgl
1182
+ metrics:
1183
+ - name: byte_perplexity
1184
+ type: byte_perplexity
1185
+ value: 5.667053833119858
1186
+ verified: false
1187
+ - task:
1188
+ type: text-generation
1189
+ name: text generation
1190
+ dataset:
1191
+ name: gsarti/flores_101_tha
1192
+ type: gsarti/flores_101_tha
1193
+ metrics:
1194
+ - name: byte_perplexity
1195
+ type: byte_perplexity
1196
+ value: 2.365940201944242
1197
+ verified: false
1198
+ - task:
1199
+ type: text-generation
1200
+ name: text generation
1201
+ dataset:
1202
+ name: gsarti/flores_101_tur
1203
+ type: gsarti/flores_101_tur
1204
+ metrics:
1205
+ - name: byte_perplexity
1206
+ type: byte_perplexity
1207
+ value: 4.885014749844601
1208
+ verified: false
1209
+ - task:
1210
+ type: text-generation
1211
+ name: text generation
1212
+ dataset:
1213
+ name: gsarti/flores_101_ukr
1214
+ type: gsarti/flores_101_ukr
1215
+ metrics:
1216
+ - name: byte_perplexity
1217
+ type: byte_perplexity
1218
+ value: 2.7240934990288483
1219
+ verified: false
1220
+ - task:
1221
+ type: text-generation
1222
+ name: text generation
1223
+ dataset:
1224
+ name: gsarti/flores_101_umb
1225
+ type: gsarti/flores_101_umb
1226
+ metrics:
1227
+ - name: byte_perplexity
1228
+ type: byte_perplexity
1229
+ value: 12.766915508610673
1230
+ verified: false
1231
+ - task:
1232
+ type: text-generation
1233
+ name: text generation
1234
+ dataset:
1235
+ name: gsarti/flores_101_urd
1236
+ type: gsarti/flores_101_urd
1237
+ metrics:
1238
+ - name: byte_perplexity
1239
+ type: byte_perplexity
1240
+ value: 1.9797467071381232
1241
+ verified: false
1242
+ - task:
1243
+ type: text-generation
1244
+ name: text generation
1245
+ dataset:
1246
+ name: gsarti/flores_101_uzb
1247
+ type: gsarti/flores_101_uzb
1248
+ metrics:
1249
+ - name: byte_perplexity
1250
+ type: byte_perplexity
1251
+ value: 12.002337637722146
1252
+ verified: false
1253
+ - task:
1254
+ type: text-generation
1255
+ name: text generation
1256
+ dataset:
1257
+ name: gsarti/flores_101_vie
1258
+ type: gsarti/flores_101_vie
1259
+ metrics:
1260
+ - name: byte_perplexity
1261
+ type: byte_perplexity
1262
+ value: 1.76578415476397
1263
+ verified: false
1264
+ - task:
1265
+ type: text-generation
1266
+ name: text generation
1267
+ dataset:
1268
+ name: gsarti/flores_101_wol
1269
+ type: gsarti/flores_101_wol
1270
+ metrics:
1271
+ - name: byte_perplexity
1272
+ type: byte_perplexity
1273
+ value: 9.144285650306488
1274
+ verified: false
1275
+ - task:
1276
+ type: text-generation
1277
+ name: text generation
1278
+ dataset:
1279
+ name: gsarti/flores_101_xho
1280
+ type: gsarti/flores_101_xho
1281
+ metrics:
1282
+ - name: byte_perplexity
1283
+ type: byte_perplexity
1284
+ value: 7.403240538286952
1285
+ verified: false
1286
+ - task:
1287
+ type: text-generation
1288
+ name: text generation
1289
+ dataset:
1290
+ name: gsarti/flores_101_yor
1291
+ type: gsarti/flores_101_yor
1292
+ metrics:
1293
+ - name: byte_perplexity
1294
+ type: byte_perplexity
1295
+ value: 5.91272037551173
1296
+ verified: false
1297
+ - task:
1298
+ type: text-generation
1299
+ name: text generation
1300
+ dataset:
1301
+ name: gsarti/flores_101_zho_simpl
1302
+ type: gsarti/flores_101_zho_simpl
1303
+ metrics:
1304
+ - name: byte_perplexity
1305
+ type: byte_perplexity
1306
+ value: 2.2769070822768533
1307
+ verified: false
1308
+ - task:
1309
+ type: text-generation
1310
+ name: text generation
1311
+ dataset:
1312
+ name: gsarti/flores_101_zho_trad
1313
+ type: gsarti/flores_101_zho_trad
1314
+ metrics:
1315
+ - name: byte_perplexity
1316
+ type: byte_perplexity
1317
+ value: 2.5180582198242383
1318
+ verified: false
1319
+ - task:
1320
+ type: text-generation
1321
+ name: text generation
1322
+ dataset:
1323
+ name: gsarti/flores_101_zul
1324
+ type: gsarti/flores_101_zul
1325
+ metrics:
1326
+ - name: byte_perplexity
1327
+ type: byte_perplexity
1328
+ value: 8.53353320693145
1329
+ verified: false
1330
+ - task:
1331
+ type: text-generation
1332
+ name: text generation
1333
+ dataset:
1334
+ name: headqa
1335
+ type: headqa
1336
+ metrics:
1337
+ - name: acc
1338
+ type: acc
1339
+ value: 0.26440554339897887
1340
+ verified: false
1341
+ - task:
1342
+ type: text-generation
1343
+ name: text generation
1344
+ dataset:
1345
+ name: hellaswag
1346
+ type: hellaswag
1347
+ metrics:
1348
+ - name: acc
1349
+ type: acc
1350
+ value: 0.41236805417247563
1351
+ verified: false
1352
+ - task:
1353
+ type: text-generation
1354
+ name: text generation
1355
+ dataset:
1356
+ name: logiqa
1357
+ type: logiqa
1358
+ metrics:
1359
+ - name: acc
1360
+ type: acc
1361
+ value: 0.2073732718894009
1362
+ verified: false
1363
+ - task:
1364
+ type: text-generation
1365
+ name: text generation
1366
+ dataset:
1367
+ name: mathqa
1368
+ type: mathqa
1369
+ metrics:
1370
+ - name: acc
1371
+ type: acc
1372
+ value: 0.24958123953098826
1373
+ verified: false
1374
+ - task:
1375
+ type: text-generation
1376
+ name: text generation
1377
+ dataset:
1378
+ name: mc_taco
1379
+ type: mc_taco
1380
+ metrics:
1381
+ - name: em
1382
+ type: em
1383
+ value: 0.11936936936936937
1384
+ verified: false
1385
+ - task:
1386
+ type: text-generation
1387
+ name: text generation
1388
+ dataset:
1389
+ name: mnli
1390
+ type: mnli
1391
+ metrics:
1392
+ - name: acc
1393
+ type: acc
1394
+ value: 0.35496688741721855
1395
+ verified: false
1396
+ - task:
1397
+ type: text-generation
1398
+ name: text generation
1399
+ dataset:
1400
+ name: mnli_mismatched
1401
+ type: mnli_mismatched
1402
+ metrics:
1403
+ - name: acc
1404
+ type: acc
1405
+ value: 0.35211554109031734
1406
+ verified: false
1407
+ - task:
1408
+ type: text-generation
1409
+ name: text generation
1410
+ dataset:
1411
+ name: mrpc
1412
+ type: mrpc
1413
+ metrics:
1414
+ - name: acc
1415
+ type: acc
1416
+ value: 0.5857843137254902
1417
+ verified: false
1418
+ - task:
1419
+ type: text-generation
1420
+ name: text generation
1421
+ dataset:
1422
+ name: multirc
1423
+ type: multirc
1424
+ metrics:
1425
+ - name: acc
1426
+ type: acc
1427
+ value: 0.5375412541254125
1428
+ verified: false
1429
+ - task:
1430
+ type: text-generation
1431
+ name: text generation
1432
+ dataset:
1433
+ name: openbookqa
1434
+ type: openbookqa
1435
+ metrics:
1436
+ - name: acc
1437
+ type: acc
1438
+ value: 0.216
1439
+ verified: false
1440
+ - task:
1441
+ type: text-generation
1442
+ name: text generation
1443
+ dataset:
1444
+ name: piqa
1445
+ type: piqa
1446
+ metrics:
1447
+ - name: acc
1448
+ type: acc
1449
+ value: 0.7078346028291621
1450
+ verified: false
1451
+ - task:
1452
+ type: text-generation
1453
+ name: text generation
1454
+ dataset:
1455
+ name: prost
1456
+ type: prost
1457
+ metrics:
1458
+ - name: acc
1459
+ type: acc
1460
+ value: 0.22683603757472245
1461
+ verified: false
1462
+ - task:
1463
+ type: text-generation
1464
+ name: text generation
1465
+ dataset:
1466
+ name: pubmedqa
1467
+ type: pubmedqa
1468
+ metrics:
1469
+ - name: acc
1470
+ type: acc
1471
+ value: 0.616
1472
+ verified: false
1473
+ - task:
1474
+ type: text-generation
1475
+ name: text generation
1476
+ dataset:
1477
+ name: qnli
1478
+ type: qnli
1479
+ metrics:
1480
+ - name: acc
1481
+ type: acc
1482
+ value: 0.5072304594545122
1483
+ verified: false
1484
+ - task:
1485
+ type: text-generation
1486
+ name: text generation
1487
+ dataset:
1488
+ name: qqp
1489
+ type: qqp
1490
+ metrics:
1491
+ - name: acc
1492
+ type: acc
1493
+ value: 0.3842443729903537
1494
+ verified: false
1495
+ - task:
1496
+ type: text-generation
1497
+ name: text generation
1498
+ dataset:
1499
+ name: race
1500
+ type: race
1501
+ metrics:
1502
+ - name: acc
1503
+ type: acc
1504
+ value: 0.3521531100478469
1505
+ verified: false
1506
+ - task:
1507
+ type: text-generation
1508
+ name: text generation
1509
+ dataset:
1510
+ name: rte
1511
+ type: rte
1512
+ metrics:
1513
+ - name: acc
1514
+ type: acc
1515
+ value: 0.47653429602888087
1516
+ verified: false
1517
+ - task:
1518
+ type: text-generation
1519
+ name: text generation
1520
+ dataset:
1521
+ name: sciq
1522
+ type: sciq
1523
+ metrics:
1524
+ - name: acc
1525
+ type: acc
1526
+ value: 0.892
1527
+ verified: false
1528
+ - task:
1529
+ type: text-generation
1530
+ name: text generation
1531
+ dataset:
1532
+ name: sst
1533
+ type: sst
1534
+ metrics:
1535
+ - name: acc
1536
+ type: acc
1537
+ value: 0.5177752293577982
1538
+ verified: false
1539
+ - task:
1540
+ type: text-generation
1541
+ name: text generation
1542
+ dataset:
1543
+ name: triviaqa
1544
+ type: triviaqa
1545
+ metrics:
1546
+ - name: acc
1547
+ type: acc
1548
+ value: 0.041633518960487934
1549
+ verified: false
1550
+ - task:
1551
+ type: text-generation
1552
+ name: text generation
1553
+ dataset:
1554
+ name: tydiqa_primary
1555
+ type: tydiqa_primary
1556
+ metrics:
1557
+ - name: acc
1558
+ type: acc
1559
+ value: 0.3011337608795236
1560
+ verified: false
1561
+ - task:
1562
+ type: text-generation
1563
+ name: text generation
1564
+ dataset:
1565
+ name: webqs
1566
+ type: webqs
1567
+ metrics:
1568
+ - name: acc
1569
+ type: acc
1570
+ value: 0.01673228346456693
1571
+ verified: false
1572
+ - task:
1573
+ type: text-generation
1574
+ name: text generation
1575
+ dataset:
1576
+ name: wic
1577
+ type: wic
1578
+ metrics:
1579
+ - name: acc
1580
+ type: acc
1581
+ value: 0.5015673981191222
1582
+ verified: false
1583
+ - task:
1584
+ type: text-generation
1585
+ name: text generation
1586
+ dataset:
1587
+ name: winogrande
1588
+ type: winogrande
1589
+ metrics:
1590
+ - name: acc
1591
+ type: acc
1592
+ value: 0.5864246250986582
1593
+ verified: false
1594
+ - task:
1595
+ type: text-generation
1596
+ name: text generation
1597
+ dataset:
1598
+ name: wnli
1599
+ type: wnli
1600
+ metrics:
1601
+ - name: acc
1602
+ type: acc
1603
+ value: 0.471830985915493
1604
+ verified: false
1605
+ - task:
1606
+ type: text-generation
1607
+ name: text generation
1608
+ dataset:
1609
+ name: wsc
1610
+ type: wsc
1611
+ metrics:
1612
+ - name: acc
1613
+ type: acc
1614
+ value: 0.4423076923076923
1615
+ verified: false
1616
+ - task:
1617
+ type: text-generation
1618
+ name: text generation
1619
+ dataset:
1620
+ name: humaneval
1621
+ type: humaneval
1622
+ metrics:
1623
+ - name: pass@1
1624
+ type: pass@1
1625
+ value: 0.15524390243902436
1626
+ verified: false
1627
+ - name: pass@10
1628
+ type: pass@10
1629
+ value: 0.3220367632383857
1630
+ verified: false
1631
+ - name: pass@100
1632
+ type: pass@100
1633
+ value: 0.5545431515723145
1634
+ verified: false
1635
+ ---
1636
+
1637
+ <h1 style='text-align: center '>BLOOM LM</h1>
1638
+ <h2 style='text-align: center '><em>BigScience Large Open-science Open-access Multilingual Language Model</em> </h2>
1639
+ <h3 style='text-align: center '>Model Card</h3>
1640
+ <img src="https://s3.amazonaws.com/moonup/production/uploads/1657124309515-5f17f0a0925b9863e28ad517.png" alt="BigScience Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
1641
+
1642
+ Version 1.0 / 26.May.2022
1643
+
1644
+ ## Table of Contents
1645
+ 1. [Model Details](#model-details)
1646
+ 2. [Uses](#uses)
1647
+ 3. [Training Data](#training-data)
1648
+ 4. [Risks and Limitations](#risks-and-limitations)
1649
+ 5. [Evaluation](#evaluation)
1650
+ 6. [Recommendations](#recommendations)
1651
+ 7. [Glossary and Calculations](#glossary-and-calculations)
1652
+ 8. [More Information](#more-information)
1653
+ 9. [Model Card Authors](#model-card-authors)
1654
+
1655
+ ## Model Details
1656
+
1657
+ ### Basics
1658
+ *This section provides information for anyone who wants to know about the model.*
1659
+
1660
+ <details>
1661
+ <summary>Click to expand</summary> <br/>
1662
+
1663
+ **Developed by:** BigScience ([website](https://bigscience.huggingface.co))
1664
+
1665
+ * All collaborators are either volunteers or have an agreement with their employer. *(Further breakdown of participants forthcoming.)*
1666
+
1667
+ **Model Type:** Transformer-based Language Model
1668
+
1669
+ **Version:** 1.0.0
1670
+
1671
+ **Languages:** Multiple; see [training data](#training-data)
1672
+
1673
+ **License:** RAIL License v1.0 ([link](https://huggingface.co/spaces/bigscience/license))
1674
+
1675
+ **Release Date Estimate:** Monday, 11.July.2022
1676
+
1677
+ **Send Questions to:** bigscience-contact@googlegroups.com
1678
+
1679
+ **Cite as:** BigScience, _BigScience Language Open-science Open-access Multilingual (BLOOM) Language Model_. International, May 2021-May 2022
1680
+
1681
+ **Funded by:**
1682
+
1683
+ * The French government.
1684
+
1685
+ * Hugging Face ([website](https://huggingface.co)).
1686
+
1687
+ * Organizations of contributors. *(Further breakdown of organizations forthcoming.)*
1688
+
1689
+ </details>
1690
+
1691
+ ### Technical Specifications
1692
+ *This section provides information for people who work on model development.*
1693
+
1694
+ <details>
1695
+ <summary>Click to expand</summary><br/>
1696
+
1697
+ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml#readme) for full details on replicating training.
1698
+
1699
+ **Model Architecture:** Modified from Megatron-LM GPT2 (see [paper](https://arxiv.org/abs/1909.08053), [BLOOM Megatron code](https://github.com/bigscience-workshop/Megatron-DeepSpeed)):
1700
+
1701
+ * Decoder-only architecture
1702
+
1703
+ * Layer normalization applied to word embeddings layer (`StableEmbedding`; see [code](https://github.com/facebookresearch/bitsandbytes), [paper](https://arxiv.org/pdf/2110.02861.pdf))
1704
+
1705
+ * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
1706
+
1707
+ * 3,002,557,440 parameters:
1708
+
1709
+ * 642,252,800 embedding parameters
1710
+
1711
+ * 30 layers, 32 attention heads
1712
+
1713
+ * Hidden layers are 2560-dimensional
1714
+
1715
+ * Sequence length of 2048 tokens used (see [BLOOM tokenizer](https://huggingface.co/bigscience/tokenizer), [tokenizer description](#tokenization))
1716
+
1717
+ **Objective Function:** Cross Entropy with mean reduction (see [API documentation](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)).
1718
+
1719
+ **Compute infrastructure:** Jean Zay Public Supercomputer, provided by the French government (see [announcement](https://www.enseignementsup-recherche.gouv.fr/fr/signature-du-marche-d-acquisition-de-l-un-des-supercalculateurs-les-plus-puissants-d-europe-46733)).
1720
+
1721
+ * Hardware: 384 A100 80GB GPUs (48 nodes):
1722
+
1723
+ * Additional 32 A100 80GB GPUs (4 nodes) in reserve
1724
+
1725
+ * 8 GPUs per node Using NVLink 4 inter-gpu connects, 4 OmniPath links
1726
+
1727
+ * CPU: AMD
1728
+
1729
+ * CPU memory: 512GB per node
1730
+
1731
+ * GPU memory: 640GB per node
1732
+
1733
+ * Inter-node connect: Omni-Path Architecture (OPA)
1734
+
1735
+ * NCCL-communications network: a fully dedicated subnet
1736
+
1737
+ * Disc IO network: shared network with other types of nodes
1738
+
1739
+ * Software:
1740
+
1741
+ * Megatron-DeepSpeed ([Github link](https://github.com/bigscience-workshop/Megatron-DeepSpeed))
1742
+
1743
+ * DeepSpeed ([Github link](https://github.com/microsoft/DeepSpeed))
1744
+
1745
+ * PyTorch (pytorch-1.11 w/ CUDA-11.5; see [Github link](https://github.com/pytorch/pytorch))
1746
+
1747
+ * apex ([Github link](https://github.com/NVIDIA/apex))
1748
+
1749
+
1750
+ #### **Training**
1751
+
1752
+ Training logs: [Tensorboard link](https://huggingface.co/tensorboard/bigscience/tr11c-2B5-logs)
1753
+
1754
+ - Number of epochs: 1 (*current target*)
1755
+
1756
+ - Dates:
1757
+
1758
+ - Started 11th March, 2022 11:42am PST
1759
+
1760
+ - Ended 5th July, 2022
1761
+
1762
+ - Estimated cost of training: Equivalent of $2-5M in cloud computing (including preliminary experiments)
1763
+
1764
+ - Server training location: Île-de-France, France
1765
+
1766
+ #### **Tokenization**
1767
+
1768
+ The BLOOM tokenizer ([link](https://huggingface.co/bigscience/tokenizer)) is a learned subword tokenizer trained using:
1769
+
1770
+ - A byte-level Byte Pair Encoding (BPE) algorithm
1771
+
1772
+ - A simple pre-tokenization rule, no normalization
1773
+
1774
+ - A vocabulary size of 250,680
1775
+
1776
+ It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language.
1777
+
1778
+ </details>
1779
+
1780
+
1781
+ ### Environmental Impact
1782
+
1783
+ <details>
1784
+ <summary>Click to expand</summary><br/>
1785
+
1786
+ The training supercomputer, Jean Zay ([website](http://www.idris.fr/eng/jean-zay/jean-zay-presentation-eng.html)), uses mostly nuclear energy. The heat generated by it is reused for heating campus housing.
1787
+
1788
+ **Estimated carbon emissions:** *(Forthcoming upon completion of training.)*
1789
+
1790
+ **Estimated electricity usage:** *(Forthcoming upon completion of training.)*
1791
+
1792
+
1793
+ </details>
1794
+ <p>&nbsp;</p>
1795
+
1796
+ ## Uses
1797
+
1798
+ *This section addresses questions around how the model is intended to be used, discusses the foreseeable users of the model (including those affected by the model), and describes uses that are considered out of scope or misuse of the model.
1799
+ It provides information for anyone considering using the model or who is affected by the model.*
1800
+
1801
+
1802
+ <details>
1803
+ <summary>Click to expand</summary><br/>
1804
+
1805
+ ### Intended Use
1806
+
1807
+ This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pretrained base model that can be further fine-tuned for specific tasks. Use cases below are not exhaustive.
1808
+
1809
+ #### **Direct Use**
1810
+
1811
+ - Text generation
1812
+
1813
+ - Exploring characteristics of language generated by a language model
1814
+
1815
+ - Examples: Cloze tests, counterfactuals, generations with reframings
1816
+
1817
+ #### **Downstream Use**
1818
+
1819
+ - Tasks that leverage language models include: Information Extraction, Question Answering, Summarization
1820
+
1821
+ ### Misuse and Out-of-scope Use
1822
+ *This section addresses what users ought not do with the model.*
1823
+
1824
+ See the [BLOOM License](https://huggingface.co/spaces/bigscience/license), Attachment A, for detailed usage restrictions. The below list is non-exhaustive, but lists some easily foreseeable problematic use cases.
1825
+
1826
+ #### **Out-of-scope Uses**
1827
+
1828
+ Using the model in [high-stakes](#high-stakes) settings is out of scope for this model.Β  The model is not designed for [critical decisions](#critical-decisions) nor uses with any material consequences on an individual's livelihood or wellbeing. The model outputs content that appears factual but is not correct.
1829
+
1830
+ ##### Out-of-scope Uses Include:
1831
+
1832
+ - Usage in biomedical domains, political and legal domains, or finance domains
1833
+
1834
+ - Usage for evaluating or scoring individuals, such as for employment, education, or credit
1835
+
1836
+ - Applying the model for critical automatic decisions, generating factual content, creating reliable summaries, or generating predictions that must be correct
1837
+
1838
+ #### **Misuse**
1839
+
1840
+ Intentionally using the model for harm, violating [human rights](#human-rights), or other kinds of malicious activities, is a misuse of this model. This includes:
1841
+
1842
+ - Spam generation
1843
+
1844
+ - Disinformation and influence operations
1845
+
1846
+ - Disparagement and defamation
1847
+
1848
+ - Harassment and abuse
1849
+
1850
+ - [Deception](#deception)
1851
+
1852
+ - Unconsented impersonation and imitation
1853
+
1854
+ - Unconsented surveillance
1855
+
1856
+ - Generating content without attribution to the model, as specified in the [RAIL License, Use Restrictions](https://huggingface.co/spaces/bigscience/license)
1857
+
1858
+ ### Intended Users
1859
+
1860
+ #### **Direct Users**
1861
+
1862
+ - General Public
1863
+
1864
+ - Researchers
1865
+
1866
+ - Students
1867
+
1868
+ - Educators
1869
+
1870
+ - Engineers/developers
1871
+
1872
+ - Non-commercial entities
1873
+
1874
+ - Community advocates, including human and civil rights groups
1875
+
1876
+ #### Indirect Users
1877
+
1878
+ - Users of derivatives created by Direct Users, such as those using software with an [intended use](#intended-use)
1879
+
1880
+ - Users of [Derivatives of the Model, as described in the License](https://huggingface.co/spaces/bigscience/license)
1881
+
1882
+ #### Others Affected (Parties Prenantes)
1883
+
1884
+ - People and groups referred to by the LLM
1885
+
1886
+ - People and groups exposed to outputs of, or decisions based on, the LLM
1887
+
1888
+ - People and groups whose original work is included in the LLM
1889
+
1890
+ </details>
1891
+ <p>&nbsp;</p>
1892
+
1893
+ ## Training Data
1894
+ *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
1895
+
1896
+
1897
+ <details>
1898
+ <summary>Click to expand</summary><br/>
1899
+
1900
+ Details for each dataset are provided in individual [Data Cards](https://huggingface.co/spaces/bigscience/BigScienceCorpus).
1901
+
1902
+ Training data includes:
1903
+
1904
+ - 45 natural languages
1905
+
1906
+ - 12 programming languages
1907
+
1908
+ - In 1.5TB of pre-processed text, converted into 350B unique tokens (see [the tokenizer section](#tokenization) for more.)
1909
+
1910
+
1911
+ #### **Languages**
1912
+
1913
+ The pie chart shows the distribution of languages in training data.
1914
+
1915
+ ![pie chart showing the distribution of languages in training data](https://github.com/bigscience-workshop/model_card/blob/main/assets/data/pie_chart.svg?raw=true)
1916
+
1917
+
1918
+ The following table shows the further distribution of Niger-Congo and Indic languages in the training data.
1919
+ <details>
1920
+ <summary>Click to expand</summary><br/>
1921
+
1922
+ | Niger Congo | Percentage | | Indic | Percentage |
1923
+ |----------------|------------ |------ |-----------|------------|
1924
+ | Chi Tumbuka | 0.00002 | | Assamese | 0.01 |
1925
+ | Kikuyu | 0.00004 | | Odia | 0.04 |
1926
+ | Bambara | 0.00004 | | Gujarati | 0.04 |
1927
+ | Akan | 0.00007 | | Marathi | 0.05 |
1928
+ | Xitsonga | 0.00007 | | Punjabi | 0.05 |
1929
+ | Sesotho | 0.00007 | | Kannada | 0.06 |
1930
+ | Chi Chewa | 0.0001 | | Nepali | 0.07 |
1931
+ | Setswana | 0.0002 | | Telugu | 0.09 |
1932
+ | Northern Sotho | 0.0002 | | Malayalam | 0.10 |
1933
+ | Fon | 0.0002 | | Urdu | 0.10 |
1934
+ | Kirundi | 0.0003 | | Tamil | 0.20 |
1935
+ | Wolof | 0.0004 | | Bengali | 0.50 |
1936
+ | Kuganda | 0.0004 | | Hindi | 0.70 |
1937
+ | Chi Shona | 0.001 |
1938
+ | Isi Zulu | 0.001 |
1939
+ | Igbo | 0.001 |
1940
+ | Xhosa | 0.001 |
1941
+ | Kinyarwanda | 0.003 |
1942
+ | Yoruba | 0.006 |
1943
+ | Swahili | 0.02 |
1944
+ </details>
1945
+
1946
+ The following table shows the distribution of programming languages.
1947
+ <details>
1948
+ <summary>Click to expand</summary><br/>
1949
+
1950
+ | Extension | Language | Number of files |
1951
+ |----------------|------------|-----------------|
1952
+ | java | Java | 5,407,724 |
1953
+ | php | PHP | 4,942,186 |
1954
+ | cpp | C++ | 2,503,930 |
1955
+ | py | Python | 2,435,072 |
1956
+ | js | JavaScript | 1,905,518 |
1957
+ | cs | C# | 1,577,347 |
1958
+ | rb | Ruby | 6,78,413 |
1959
+ | cc | C++ | 443,054 |
1960
+ | hpp | C++ | 391,048 |
1961
+ | lua | Lua | 352,317 |
1962
+ | go | GO | 227,763 |
1963
+ | ts | TypeScript | 195,254 |
1964
+ | C | C | 134,537 |
1965
+ | scala | Scala | 92,052 |
1966
+ | hh | C++ | 67,161 |
1967
+ | H | C++ | 55,899 |
1968
+ | tsx | TypeScript | 33,107 |
1969
+ | rs | Rust | 29,693 |
1970
+ | phpt | PHP | 9,702 |
1971
+ | c++ | C++ | 1,342 |
1972
+ | h++ | C++ | 791 |
1973
+ | php3 | PHP | 540 |
1974
+ | phps | PHP | 270 |
1975
+ | php5 | PHP | 166 |
1976
+ | php4 | PHP | 29 |
1977
+
1978
+ </details>
1979
+ </details>
1980
+ <p>&nbsp;</p>
1981
+
1982
+ ## Risks and Limitations
1983
+ *This section identifies foreseeable harms and misunderstandings.*
1984
+
1985
+ <details>
1986
+ <summary>Click to expand</summary><br/>
1987
+
1988
+ Model may:
1989
+
1990
+ - Overrepresent some viewpoints and underrepresent others
1991
+
1992
+ - Contain stereotypes
1993
+
1994
+ - Contain [personal information](#personal-data-and-information)
1995
+
1996
+ - Generate:
1997
+
1998
+ - Hateful, abusive, or violent language
1999
+
2000
+ - Discriminatory or prejudicial language
2001
+
2002
+ - Content that may not be appropriate for all settings, including sexual content
2003
+
2004
+ - Make errors, including producing incorrect information as if it were factual
2005
+
2006
+ - Generate irrelevant or repetitive outputs
2007
+ </details>
2008
+ <p>&nbsp;</p>
2009
+
2010
+ ## Evaluation
2011
+ *This section describes the evaluation protocols and provides the results.*
2012
+
2013
+ <details>
2014
+ <summary>Click to expand</summary><br/>
2015
+
2016
+ ### Metrics
2017
+ *This section describes the different ways performance is calculated and why.*
2018
+
2019
+ Includes:
2020
+
2021
+ | Metric | Why chosen |
2022
+ |--------------------|--------------------------------------------------------------------|
2023
+ | [Perplexity](#perplexity) | Standard metric for quantifying model improvements during training |
2024
+ | Cross Entropy [Loss](#loss) | Standard objective for language models. |
2025
+
2026
+ And multiple different metrics for specific tasks. _(More evaluation metrics forthcoming upon completion of evaluation protocol.)_
2027
+
2028
+ ### Factors
2029
+ *This section lists some different aspects of BLOOM models. Its focus is on aspects that are likely to give rise to high variance in model behavior.*
2030
+
2031
+ - Language, such as English or Yoruba
2032
+
2033
+ - Domain, such as newswire or stories
2034
+
2035
+ - Demographic characteristics, such as gender or nationality
2036
+
2037
+ ### Results
2038
+ *Results are based on the [Factors](#factors) and [Metrics](#metrics).*
2039
+
2040
+ **Zero-shot evaluations:**
2041
+
2042
+ See this repository for JSON files: https://github.com/bigscience-workshop/evaluation-results
2043
+
2044
+ | Task | Language | Metric | BLOOM-2B5 |
2045
+ |:----|:----|:----|:----:|
2046
+ | arc_challenge | eng | acc ↑ | 0.28 |
2047
+ | arc_easy | eng | acc ↑ | 0.595 |
2048
+ | axb (Median of 10 prompts) | eng | acc ↑ | 0.443 |
2049
+ | axg (Median of 10 prompts) | eng | acc ↑ | 0.5 |
2050
+ | boolq (Median of 11 prompts) | eng | acc ↑ | 0.617 |
2051
+ | cb (Median of 15 prompts) | eng | acc ↑ | 0.304 |
2052
+ | cola (Median of 5 prompts) | eng | acc ↑ | 0.611 |
2053
+ | copa (Median of 9 prompts) | eng | acc ↑ | 0.63 |
2054
+ | crows_pairs_english (Median of 6 prompts) | eng | acc ↑ | 0.497 |
2055
+ | crows_pairs_french (Median of 7 prompts) | fra | acc ↑ | 0.503 |
2056
+ | diabla (Median of 2 prompts) | eng | acc ↑ | 0.289 |
2057
+ | gsarti/flores_101_afr | afr | byte_perplexity ↓ | 6.501 |
2058
+ | gsarti/flores_101_amh | amh | byte_perplexity ↓ | 3.973 |
2059
+ | gsarti/flores_101_ara | ara | byte_perplexity ↓ | 1.808 |
2060
+ | gsarti/flores_101_asm | asm | byte_perplexity ↓ | 5.699 |
2061
+ | gsarti/flores_101_ast | ast | byte_perplexity ↓ | 3.925 |
2062
+ | gsarti/flores_101_azj | azj | byte_perplexity ↓ | 6.943 |
2063
+ | gsarti/flores_101_bel | bel | byte_perplexity ↓ | 3.614 |
2064
+ | gsarti/flores_101_ben | ben | byte_perplexity ↓ | 5.121 |
2065
+ | gsarti/flores_101_bos | bos | byte_perplexity ↓ | 5.653 |
2066
+ | gsarti/flores_101_bul | bul | byte_perplexity ↓ | 2.701 |
2067
+ | gsarti/flores_101_cat | cat | byte_perplexity ↓ | 2.305 |
2068
+ | gsarti/flores_101_ceb | ceb | byte_perplexity ↓ | 6.291 |
2069
+ | gsarti/flores_101_ces | ces | byte_perplexity ↓ | 5.447 |
2070
+ | gsarti/flores_101_ckb | ckb | byte_perplexity ↓ | 3.726 |
2071
+ | gsarti/flores_101_cym | cym | byte_perplexity ↓ | 12.539 |
2072
+ | gsarti/flores_101_dan | dan | byte_perplexity ↓ | 5.183 |
2073
+ | gsarti/flores_101_deu | deu | byte_perplexity ↓ | 3.118 |
2074
+ | gsarti/flores_101_ell | ell | byte_perplexity ↓ | 2.468 |
2075
+ | gsarti/flores_101_eng | eng | byte_perplexity ↓ | 2.019 |
2076
+ | gsarti/flores_101_est | est | byte_perplexity ↓ | 9.117 |
2077
+ | gsarti/flores_101_fas | fas | byte_perplexity ↓ | 3.058 |
2078
+ | gsarti/flores_101_fin | fin | byte_perplexity ↓ | 6.847 |
2079
+ | gsarti/flores_101_fra | fra | byte_perplexity ↓ | 1.998 |
2080
+ | gsarti/flores_101_ful | ful | byte_perplexity ↓ | 11.466 |
2081
+ | gsarti/flores_101_gle | gle | byte_perplexity ↓ | 8.681 |
2082
+ | gsarti/flores_101_glg | glg | byte_perplexity ↓ | 3.03 |
2083
+ | gsarti/flores_101_guj | guj | byte_perplexity ↓ | 4.955 |
2084
+ | gsarti/flores_101_hau | hau | byte_perplexity ↓ | 10.758 |
2085
+ | gsarti/flores_101_heb | heb | byte_perplexity ↓ | 3.6 |
2086
+ | gsarti/flores_101_hin | hin | byte_perplexity ↓ | 4.713 |
2087
+ | gsarti/flores_101_hrv | hrv | byte_perplexity ↓ | 5.822 |
2088
+ | gsarti/flores_101_hun | hun | byte_perplexity ↓ | 6.44 |
2089
+ | gsarti/flores_101_hye | hye | byte_perplexity ↓ | 3.658 |
2090
+ | gsarti/flores_101_ibo | ibo | byte_perplexity ↓ | 5.565 |
2091
+ | gsarti/flores_101_ind | ind | byte_perplexity ↓ | 2.16 |
2092
+ | gsarti/flores_101_isl | isl | byte_perplexity ↓ | 8.082 |
2093
+ | gsarti/flores_101_ita | ita | byte_perplexity ↓ | 2.969 |
2094
+ | gsarti/flores_101_jav | jav | byte_perplexity ↓ | 7.057 |
2095
+ | gsarti/flores_101_jpn | jpn | byte_perplexity ↓ | 2.776 |
2096
+ | gsarti/flores_101_kam | kam | byte_perplexity ↓ | 11.073 |
2097
+ | gsarti/flores_101_kan | kan | byte_perplexity ↓ | 5.552 |
2098
+ | gsarti/flores_101_kat | kat | byte_perplexity ↓ | 2.523 |
2099
+ | gsarti/flores_101_kaz | kaz | byte_perplexity ↓ | 3.39 |
2100
+ | gsarti/flores_101_kea | kea | byte_perplexity ↓ | 8.919 |
2101
+ | gsarti/flores_101_kir | kir | byte_perplexity ↓ | 3.729 |
2102
+ | gsarti/flores_101_kor | kor | byte_perplexity ↓ | 3.933 |
2103
+ | gsarti/flores_101_lao | lao | byte_perplexity ↓ | 2.908 |
2104
+ | gsarti/flores_101_lav | lav | byte_perplexity ↓ | 7.777 |
2105
+ | gsarti/flores_101_lin | lin | byte_perplexity ↓ | 7.525 |
2106
+ | gsarti/flores_101_lit | lit | byte_perplexity ↓ | 7.369 |
2107
+ | gsarti/flores_101_ltz | ltz | byte_perplexity ↓ | 8.801 |
2108
+ | gsarti/flores_101_lug | lug | byte_perplexity ↓ | 8.483 |
2109
+ | gsarti/flores_101_luo | luo | byte_perplexity ↓ | 11.976 |
2110
+ | gsarti/flores_101_mal | mal | byte_perplexity ↓ | 4.616 |
2111
+ | gsarti/flores_101_mar | mar | byte_perplexity ↓ | 5.483 |
2112
+ | gsarti/flores_101_mkd | mkd | byte_perplexity ↓ | 2.966 |
2113
+ | gsarti/flores_101_mlt | mlt | byte_perplexity ↓ | 15.005 |
2114
+ | gsarti/flores_101_mon | mon | byte_perplexity ↓ | 3.411 |
2115
+ | gsarti/flores_101_mri | mri | byte_perplexity ↓ | 7.474 |
2116
+ | gsarti/flores_101_msa | msa | byte_perplexity ↓ | 2.571 |
2117
+ | gsarti/flores_101_mya | mya | byte_perplexity ↓ | 2.414 |
2118
+ | gsarti/flores_101_nld | nld | byte_perplexity ↓ | 4.128 |
2119
+ | gsarti/flores_101_nob | nob | byte_perplexity ↓ | 5.403 |
2120
+ | gsarti/flores_101_npi | npi | byte_perplexity ↓ | 5.199 |
2121
+ | gsarti/flores_101_nso | nso | byte_perplexity ↓ | 8.155 |
2122
+ | gsarti/flores_101_nya | nya | byte_perplexity ↓ | 8.18 |
2123
+ | gsarti/flores_101_oci | oci | byte_perplexity ↓ | 4.862 |
2124
+ | gsarti/flores_101_orm | orm | byte_perplexity ↓ | 12.912 |
2125
+ | gsarti/flores_101_ory | ory | byte_perplexity ↓ | 5.189 |
2126
+ | gsarti/flores_101_pan | pan | byte_perplexity ↓ | 4.698 |
2127
+ | gsarti/flores_101_pol | pol | byte_perplexity ↓ | 4.626 |
2128
+ | gsarti/flores_101_por | por | byte_perplexity ↓ | 1.975 |
2129
+ | gsarti/flores_101_pus | pus | byte_perplexity ↓ | 4.496 |
2130
+ | gsarti/flores_101_ron | ron | byte_perplexity ↓ | 4.965 |
2131
+ | gsarti/flores_101_rus | rus | byte_perplexity ↓ | 2.05 |
2132
+ | gsarti/flores_101_slk | slk | byte_perplexity ↓ | 6.451 |
2133
+ | gsarti/flores_101_slv | slv | byte_perplexity ↓ | 6.62 |
2134
+ | gsarti/flores_101_sna | sna | byte_perplexity ↓ | 8.462 |
2135
+ | gsarti/flores_101_snd | snd | byte_perplexity ↓ | 5.466 |
2136
+ | gsarti/flores_101_som | som | byte_perplexity ↓ | 11.959 |
2137
+ | gsarti/flores_101_spa | spa | byte_perplexity ↓ | 1.897 |
2138
+ | gsarti/flores_101_srp | srp | byte_perplexity ↓ | 2.871 |
2139
+ | gsarti/flores_101_swe | swe | byte_perplexity ↓ | 5.055 |
2140
+ | gsarti/flores_101_swh | swh | byte_perplexity ↓ | 3.697 |
2141
+ | gsarti/flores_101_tam | tam | byte_perplexity ↓ | 4.539 |
2142
+ | gsarti/flores_101_tel | tel | byte_perplexity ↓ | 5.807 |
2143
+ | gsarti/flores_101_tgk | tgk | byte_perplexity ↓ | 3.599 |
2144
+ | gsarti/flores_101_tgl | tgl | byte_perplexity ↓ | 5.667 |
2145
+ | gsarti/flores_101_tha | tha | byte_perplexity ↓ | 2.366 |
2146
+ | gsarti/flores_101_tur | tur | byte_perplexity ↓ | 4.885 |
2147
+ | gsarti/flores_101_ukr | ukr | byte_perplexity ↓ | 2.724 |
2148
+ | gsarti/flores_101_umb | umb | byte_perplexity ↓ | 12.767 |
2149
+ | gsarti/flores_101_urd | urd | byte_perplexity ↓ | 1.98 |
2150
+ | gsarti/flores_101_uzb | uzb | byte_perplexity ↓ | 12.002 |
2151
+ | gsarti/flores_101_vie | vie | byte_perplexity ↓ | 1.766 |
2152
+ | gsarti/flores_101_wol | wol | byte_perplexity ↓ | 9.144 |
2153
+ | gsarti/flores_101_xho | xho | byte_perplexity ↓ | 7.403 |
2154
+ | gsarti/flores_101_yor | yor | byte_perplexity ↓ | 5.913 |
2155
+ | gsarti/flores_101_zho_simpl | zho_simpl | byte_perplexity ↓ | 2.277 |
2156
+ | gsarti/flores_101_zho_trad | zho_trad | byte_perplexity ↓ | 2.518 |
2157
+ | gsarti/flores_101_zul | zul | byte_perplexity ↓ | 8.534 |
2158
+ | headqa | esp | acc ↑ | 0.264 |
2159
+ | hellaswag | eng | acc ↑ | 0.412 |
2160
+ | logiqa | eng | acc ↑ | 0.207 |
2161
+ | mathqa | eng | acc ↑ | 0.25 |
2162
+ | mc_taco | eng | em ↑ | 0.119 |
2163
+ | mnli (Median of 15 prompts) | eng | acc ↑ | 0.355 |
2164
+ | mnli_mismatched (Median of 15 prompts) | eng | acc ↑ | 0.352 |
2165
+ | mrpc | eng | acc ↑ | 0.586 |
2166
+ | multirc (Median of 11 prompts) | eng | acc ↑ | 0.538 |
2167
+ | openbookqa | eng | acc ↑ | 0.216 |
2168
+ | piqa | eng | acc ↑ | 0.708 |
2169
+ | prost | eng | acc ↑ | 0.227 |
2170
+ | pubmedqa | eng | acc ↑ | 0.616 |
2171
+ | qnli | eng | acc ↑ | 0.507 |
2172
+ | qqp (Median of 7 prompts) | eng | acc ↑ | 0.384 |
2173
+ | race | eng | acc ↑ | 0.352 |
2174
+ | rte (Median of 6 prompts) | eng | acc ↑ | 0.477 |
2175
+ | sciq | eng | acc ↑ | 0.892 |
2176
+ | sst (Median of 6 prompts) | eng | acc ↑ | 0.518 |
2177
+ | triviaqa | eng | acc ↑ | 0.042 |
2178
+ | tydiqa_primary (Median of 24 prompts) | eng | acc ↑ | 0.301 |
2179
+ | webqs | eng | acc ↑ | 0.017 |
2180
+ | wic (Median of 11 prompts) | eng | acc ↑ | 0.502 |
2181
+ | winogrande | eng | acc ↑ | 0.586 |
2182
+ | wnli (Median of 6 prompts) | eng | acc ↑ | 0.472 |
2183
+ | wsc (Median of 11 prompts) | eng | acc ↑ | 0.442 |
2184
+ | humaneval | python | pass@1 ↑ | 0.155 |
2185
+ | humaneval | python | pass@10 ↑ | 0.322 |
2186
+ | humaneval | python | pass@100 ↑ | 0.555 |
2187
+
2188
+ **Train-time Evaluation:**
2189
+
2190
+ As of 25.May.2022, 15:00 PST:
2191
+
2192
+ - Training Loss: 2.0
2193
+
2194
+ - Validation Loss: 2.2
2195
+
2196
+ - Perplexity: 8.9
2197
+
2198
+ </details>
2199
+ <p>&nbsp;</p>
2200
+
2201
+ ## Recommendations
2202
+
2203
+ *This section provides information on warnings and potential mitigations.*
2204
+
2205
+
2206
+ <details>
2207
+ <summary>Click to expand</summary><br/>
2208
+
2209
+ - Indirect users should be made aware when the content they're working with is created by the LLM.
2210
+
2211
+ - Users should be aware of [Risks and Limitations](#risks-and-limitations), and include an appropriate age disclaimer or blocking interface as necessary.
2212
+
2213
+ - Models pretrained with the LLM should include an updated Model Card.
2214
+
2215
+ - Users of the model should provide mechanisms for those affected to provide feedback, such as an email address for comments.
2216
+
2217
+ </details>
2218
+ <p>&nbsp;</p>
2219
+
2220
+ ## Glossary and Calculations
2221
+
2222
+ *This section defines common terms and how metrics are calculated.*
2223
+
2224
+
2225
+
2226
+ <details>
2227
+ <summary>Click to expand</summary><br/>
2228
+
2229
+ - <a name="loss">**Loss:**</a> A calculation of the difference between what the model has learned and what the data shows ("groundtruth"). The lower the loss, the better. The training process aims to minimize the loss.
2230
+
2231
+ - <a name="perplexity">**Perplexity:**</a> This is based on what the model estimates the probability of new data is. The lower the perplexity, the better. If the model is 100% correct at predicting the next token it will see, then the perplexity is 1. Mathematically this is calculated using entropy.
2232
+
2233
+ - <a name="high-stakes">**High-stakes settings:**</a> Such as those identified as "high-risk AI systems" and "unacceptable risk AI systems" in the European Union's proposed [Artificial Intelligence (AI) Act](https://artificialintelligenceact.eu/annexes/).
2234
+
2235
+ - <a name="critical-decisions">**Critical decisions:**</a> Such as those defined in [the United States' proposed Algorithmic Accountability Act](https://www.congress.gov/117/bills/s3572/BILLS-117s3572is.pdf).
2236
+
2237
+ - <a name="human-rights">**Human rights:**</a> Includes those rights defined in the [Universal Declaration of Human Rights](https://www.un.org/sites/un2.un.org/files/2021/03/udhr.pdf).
2238
+
2239
+ - <a name="personal-data-and-information">**Personal Data and Personal Information:**</a> Personal data and information is defined in multiple data protection regulations, such as "[personal data](https://gdpr-info.eu/issues/personal-data/)" in the [European Union's General Data Protection Regulation](https://gdpr-info.eu); and "personal information" in the Republic of South Africa's [Protection of Personal Information Act](https://www.gov.za/sites/default/files/gcis_document/201409/3706726-11act4of2013popi.pdf), The People's Republic of China's [Personal information protection law](http://en.npc.gov.cn.cdurl.cn/2021-12/29/c_694559.htm).
2240
+
2241
+ - <a name="sensitive-characteristics">**Sensitive characteristics:**</a> This includes specifically protected categories in human rights (see [UHDR, Article 2](https://www.un.org/sites/un2.un.org/files/2021/03/udhr.pdf)) and personal information regulation (see GDPR, [Article 9; Protection of Personal Information Act, Chapter 1](https://www.gov.za/sites/default/files/gcis_document/201409/3706726-11act4of2013popi.pdf))
2242
+
2243
+ - <a name="deception">**Deception:**</a> Doing something to intentionally mislead individuals to believe something that is false, such as by creating deadbots or chatbots on social media posing as real people, or generating text documents without making consumers aware that the text is machine generated.
2244
+
2245
+ </details>
2246
+ <p>&nbsp;</p>
2247
+
2248
+ ## More Information
2249
+
2250
+ <details>
2251
+ <summary>Click to expand</summary><br/>
2252
+
2253
+ ### Dataset Creation
2254
+
2255
+ Blog post detailing the design choices during the dataset creation: https://bigscience.huggingface.co/blog/building-a-tb-scale-multilingual-dataset-for-language-modeling
2256
+
2257
+ ### Technical Specifications
2258
+
2259
+ Blog post summarizing how the architecture, size, shape, and pre-training duration where selected: https://bigscience.huggingface.co/blog/what-language-model-to-train-if-you-have-two-million-gpu-hours
2260
+
2261
+ More details on the architecture/optimizer: https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml
2262
+
2263
+ Blog post on the hardware/engineering side: https://bigscience.huggingface.co/blog/which-hardware-to-train-a-176b-parameters-model
2264
+
2265
+ Details on the distributed setup used for the training: https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml
2266
+
2267
+ Tensorboard updated during the training: https://huggingface.co/bigscience/tr11-176B-ml-logs/tensorboard#scalars&tagFilter=loss
2268
+
2269
+ Insights on how to approach training, negative results: https://github.com/bigscience-workshop/bigscience/blob/master/train/lessons-learned.md
2270
+
2271
+ Details on the obstacles overcome during the preparation on the engineering side (instabilities, optimization of training throughput, so many technical tricks and questions): https://github.com/bigscience-workshop/bigscience/blob/master/train/tr11-176B-ml/chronicles.md
2272
+
2273
+ ### Initial Results
2274
+
2275
+ Initial prompting experiments using interim checkpoints: https://huggingface.co/spaces/bigscience/bloom-book
2276
+
2277
+ </details>
2278
+ <p>&nbsp;</p>
2279
+
2280
+ ## Model Card Authors
2281
+ *Ordered roughly chronologically and by amount of time spent.*
2282
+
2283
+ Margaret Mitchell, Giada Pistilli, Yacine Jernite, Ezinwanne Ozoani, Marissa Gerchick, Nazneen Rajani, Sasha Luccioni, Irene Solaiman, Maraim Masoud, Somaieh Nikpoor, Carlos Muñoz Ferrandis, Stas Bekman, Christopher Akiki, Danish Contractor, David Lansky, Angelina McMillan-Major, Tristan Thrush, Suzana Ilić, Gérard Dupont, Shayne Longpre, Manan Dey, Stella Biderman, Douwe Kiela, Emi Baylor, Teven Le Scao, Aaron Gokaslan, Julien Launay, Niklas Muennighoff
2284
+
2285
+
2286
+