Ezi commited on
Commit
c54f29f
1 Parent(s): 258ce6c

Model Card Restructure & edits

Browse files

Some restructuring of this bloom model card based on the format we are using as part of our effort to standardise model card[doc of an annotated model card](https://huggingface.co/docs/hub/model-card-annotated). I also took out the *Click to expand* in order to allow for CLRT+F searches (cc'ing

@meg

)

Files changed (1) hide show
  1. README.md +174 -214
README.md CHANGED
@@ -60,172 +60,51 @@ pipeline_tag: text-generation
60
 
61
  Version 1.0 / 26.May.2022
62
 
 
 
 
 
 
63
  ## Table of Contents
64
  1. [Model Details](#model-details)
65
  2. [Uses](#uses)
66
- 3. [Training Data](#training-data)
67
- 4. [Risks and Limitations](#risks-and-limitations)
68
- 5. [Evaluation](#evaluation)
69
- 6. [Recommendations](#recommendations)
70
- 7. [Glossary and Calculations](#glossary-and-calculations)
71
- 8. [More Information](#more-information)
72
- 9. [Model Card Authors](#model-card-authors)
 
 
 
73
 
74
  ## Model Details
75
 
76
- ### Basics
77
  *This section provides information for anyone who wants to know about the model.*
78
-
79
- <details>
80
- <summary>Click to expand</summary> <br/>
81
-
82
- **Developed by:** BigScience ([website](https://bigscience.huggingface.co))
83
-
84
- * All collaborators are either volunteers or have an agreement with their employer. *(Further breakdown of participants forthcoming.)*
85
-
86
- **Model Type:** Transformer-based Language Model
87
-
88
- **Version:** 1.0.0
89
-
90
- **Languages:** Multiple; see [training data](#training-data)
91
-
92
- **License:** RAIL License v1.0 ([link](https://huggingface.co/spaces/bigscience/license))
93
-
94
- **Release Date Estimate:** Monday, 11.July.2022
95
-
96
- **Send Questions to:** bigscience-contact@googlegroups.com
97
-
98
- **Cite as:** BigScience, _BigScience Language Open-science Open-access Multilingual (BLOOM) Language Model_. International, May 2021-May 2022
99
-
100
- **Funded by:**
101
-
102
- * The French government.
103
-
104
- * Hugging Face ([website](https://huggingface.co)).
105
-
106
- * Organizations of contributors. *(Further breakdown of organizations forthcoming.)*
107
-
108
- </details>
109
-
110
- ### Technical Specifications
111
- *This section provides information for people who work on model development.*
112
-
113
- <details>
114
- <summary>Click to expand</summary><br/>
115
-
116
- Please see [the BLOOM training README](https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml#readme) for full details on replicating training.
117
-
118
- **Model Architecture:** Modified from Megatron-LM GPT2 (see [paper](https://arxiv.org/abs/1909.08053), [BLOOM Megatron code](https://github.com/bigscience-workshop/Megatron-DeepSpeed)):
119
-
120
- * Decoder-only architecture
121
-
122
- * Layer normalization applied to word embeddings layer (`StableEmbedding`; see [code](https://github.com/facebookresearch/bitsandbytes), [paper](https://arxiv.org/pdf/2110.02861.pdf))
123
-
124
- * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
125
-
126
- * 1,722,408,960 parameters:
127
-
128
- * 513,802,240 embedding parameters
129
-
130
- * 24 layers, 16 attention heads
131
-
132
- * Hidden layers are 2048-dimensional
133
-
134
- * Sequence length of 2048 tokens used (see [BLOOM tokenizer](https://huggingface.co/bigscience/tokenizer), [tokenizer description](#tokenization))
135
-
136
- **Objective Function:** Cross Entropy with mean reduction (see [API documentation](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)).
137
 
138
- **Compute infrastructure:** Jean Zay Public Supercomputer, provided by the French government (see [announcement](https://www.enseignementsup-recherche.gouv.fr/fr/signature-du-marche-d-acquisition-de-l-un-des-supercalculateurs-les-plus-puissants-d-europe-46733)).
139
-
140
- * Hardware: 64 V100 16/32GB GPUs (16 nodes):
141
 
142
- * 4 GPUs per node
143
 
144
- * 40 CPUs per task
 
 
 
 
 
145
 
146
- * 1 task per node
147
-
148
- * CPU: AMD
149
 
150
- * CPU memory: 160GB per node
151
-
152
- * GPU memory: 64GB or 128GB (depending on node availability during training) per node
153
-
154
- * Inter-node connect: Omni-Path Architecture (OPA)
155
-
156
- * NCCL-communications network: a fully dedicated subnet
157
-
158
- * Disc IO network: shared network with other types of nodes
159
-
160
- * Software:
161
-
162
- * Megatron-DeepSpeed ([Github link](https://github.com/bigscience-workshop/Megatron-DeepSpeed))
163
-
164
- * DeepSpeed ([Github link](https://github.com/microsoft/DeepSpeed))
165
-
166
- * PyTorch (pytorch-1.11 w/ CUDA-11.5; see [Github link](https://github.com/pytorch/pytorch))
167
-
168
- * apex ([Github link](https://github.com/NVIDIA/apex))
169
 
170
-
171
- #### **Training**
172
-
173
- - Checkpoint size:
174
-
175
- - Fp16 weights: 2.6GB (# params * 2)
176
-
177
- - Full checkpoint with optimizer states: --
178
-
179
- - Training throughput: --
180
-
181
- - Number of epochs: 1
182
-
183
- - Dates:
184
-
185
- - Start: 11th March, 2022 11:42am PST
186
-
187
- - End: 20 May, 2022
188
-
189
- - Server training location: Île-de-France, France
190
-
191
- #### **Tokenization**
192
-
193
- The BLOOM tokenizer ([link](https://huggingface.co/bigscience/tokenizer)) is a learned subword tokenizer trained using:
194
-
195
- - A byte-level Byte Pair Encoding (BPE) algorithm
196
-
197
- - A simple pre-tokenization rule, no normalization
198
-
199
- - A vocabulary size of 250,680
200
-
201
- It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language.
202
-
203
- </details>
204
-
205
-
206
- ### Environmental Impact
207
-
208
- <details>
209
- <summary>Click to expand</summary><br/>
210
-
211
- The training supercomputer, Jean Zay ([website](http://www.idris.fr/eng/jean-zay/jean-zay-presentation-eng.html)), uses mostly nuclear energy. The heat generated by it is reused for heating campus housing.
212
-
213
- **Estimated carbon emissions:** *(Forthcoming upon completion of training.)*
214
-
215
- **Estimated electricity usage:** *(Forthcoming upon completion of training.)*
216
-
217
-
218
- </details>
219
- <p>&nbsp;</p>
220
 
221
  ## Uses
222
 
223
  *This section addresses questions around how the model is intended to be used, discusses the foreseeable users of the model (including those affected by the model), and describes uses that are considered out of scope or misuse of the model.
224
  It provides information for anyone considering using the model or who is affected by the model.*
225
-
226
-
227
- <details>
228
- <summary>Click to expand</summary><br/>
229
 
230
  ### Intended Use
231
 
@@ -311,16 +190,54 @@ Intentionally using the model for harm, violating [human rights](#human-rights),
311
  - People and groups exposed to outputs of, or decisions based on, the LLM
312
 
313
  - People and groups whose original work is included in the LLM
 
 
 
 
 
314
 
315
- </details>
316
- <p>&nbsp;</p>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
317
 
318
  ## Training Data
319
  *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
320
 
321
 
322
- <details>
323
- <summary>Click to expand</summary><br/>
324
 
325
  Details for each dataset are provided in individual [Data Cards](https://huggingface.co/spaces/bigscience/BigScienceCorpus).
326
 
@@ -340,9 +257,8 @@ The pie chart shows the distribution of languages in training data.
340
  ![pie chart showing the distribution of languages in training data](https://github.com/bigscience-workshop/model_card/blob/main/assets/data/pie_chart.svg?raw=true)
341
 
342
 
343
- The following table shows the further distribution of Niger-Congo and Indic languages in the training data.
344
- <details>
345
- <summary>Click to expand</summary><br/>
346
 
347
  | Niger Congo | Percentage | | Indic | Percentage |
348
  |----------------|------------ |------ |-----------|------------|
@@ -368,9 +284,8 @@ The following table shows the further distribution of Niger-Congo and Indic lang
368
  | Swahili | 0.02 |
369
  </details>
370
 
371
- The following table shows the distribution of programming languages.
372
- <details>
373
- <summary>Click to expand</summary><br/>
374
 
375
  | Extension | Language | Number of files |
376
  |----------------|------------|-----------------|
@@ -400,43 +315,10 @@ The following table shows the distribution of programming languages.
400
  | php5 | PHP | 166 |
401
  | php4 | PHP | 29 |
402
 
403
- </details>
404
- </details>
405
- <p>&nbsp;</p>
406
-
407
- ## Risks and Limitations
408
- *This section identifies foreseeable harms and misunderstandings.*
409
-
410
- <details>
411
- <summary>Click to expand</summary><br/>
412
-
413
- Model may:
414
-
415
- - Overrepresent some viewpoints and underrepresent others
416
-
417
- - Contain stereotypes
418
-
419
- - Contain [personal information](#personal-data-and-information)
420
-
421
- - Generate:
422
-
423
- - Hateful, abusive, or violent language
424
-
425
- - Discriminatory or prejudicial language
426
-
427
- - Content that may not be appropriate for all settings, including sexual content
428
-
429
- - Make errors, including producing incorrect information as if it were factual
430
-
431
- - Generate irrelevant or repetitive outputs
432
- </details>
433
- <p>&nbsp;</p>
434
 
435
  ## Evaluation
436
  *This section describes the evaluation protocols and provides the results.*
437
 
438
- <details>
439
- <summary>Click to expand</summary><br/>
440
 
441
  ### Metrics
442
  *This section describes the different ways performance is calculated and why.*
@@ -476,36 +358,117 @@ As of 25.May.2022, 15:00 PST:
476
 
477
  - [BLOOM Book](https://huggingface.co/spaces/bigscience/bloom-book): Read generations from BLOOM based on prompts provided by the community
478
 
479
- </details>
480
- <p>&nbsp;</p>
481
 
482
- ## Recommendations
483
 
484
- *This section provides information on warnings and potential mitigations.*
485
 
 
 
 
 
 
486
 
487
- <details>
488
- <summary>Click to expand</summary><br/>
489
 
490
- - Indirect users should be made aware when the content they're working with is created by the LLM.
491
 
492
- - Users should be aware of [Risks and Limitations](#risks-and-limitations), and include an appropriate age disclaimer or blocking interface as necessary.
 
493
 
494
- - Models pretrained with the LLM should include an updated Model Card.
495
 
496
- - Users of the model should provide mechanisms for those affected to provide feedback, such as an email address for comments.
497
 
498
- </details>
499
- <p>&nbsp;</p>
500
 
501
- ## Glossary and Calculations
502
 
503
- *This section defines common terms and how metrics are calculated.*
 
 
 
 
 
 
504
 
 
505
 
 
506
 
507
- <details>
508
- <summary>Click to expand</summary><br/>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
509
 
510
  - <a name="loss">**Loss:**</a> A calculation of the difference between what the model has learned and what the data shows ("groundtruth"). The lower the loss, the better. The training process aims to minimize the loss.
511
 
@@ -523,13 +486,9 @@ As of 25.May.2022, 15:00 PST:
523
 
524
  - <a name="deception">**Deception:**</a> Doing something to intentionally mislead individuals to believe something that is false, such as by creating deadbots or chatbots on social media posing as real people, or generating text documents without making consumers aware that the text is machine generated.
525
 
526
- </details>
527
- <p>&nbsp;</p>
528
 
529
  ## More Information
530
 
531
- <details>
532
- <summary>Click to expand</summary><br/>
533
 
534
  ### Dataset Creation
535
 
@@ -554,11 +513,12 @@ Details on the obstacles overcome during the preparation on the engineering side
554
  ### Initial Results
555
 
556
  Initial prompting experiments using interim checkpoints: https://huggingface.co/spaces/bigscience/bloom-book
557
-
558
- </details>
559
- <p>&nbsp;</p>
560
 
561
  ## Model Card Authors
562
  *Ordered roughly chronologically and by amount of time spent.*
563
 
564
  Margaret Mitchell, Giada Pistilli, Yacine Jernite, Ezinwanne Ozoani, Marissa Gerchick, Nazneen Rajani, Sasha Luccioni, Irene Solaiman, Maraim Masoud, Somaieh Nikpoor, Carlos Muñoz Ferrandis, Stas Bekman, Christopher Akiki, Danish Contractor, David Lansky, Angelina McMillan-Major, Tristan Thrush, Suzana Ilić, Gérard Dupont, Shayne Longpre, Manan Dey, Stella Biderman, Douwe Kiela, Emi Baylor, Teven Le Scao, Aaron Gokaslan, Julien Launay, Niklas Muennighoff
 
 
 
 
 
60
 
61
  Version 1.0 / 26.May.2022
62
 
63
+
64
+ # Model Card for Model ID
65
+
66
+ <!-- Provide a quick summary of what the model is/does. -->
67
+
68
  ## Table of Contents
69
  1. [Model Details](#model-details)
70
  2. [Uses](#uses)
71
+ 3. [Bias, Risks, and Limitations](#bias-risks-and-limitations)
72
+ 4. [Recommendations](#recommendations)
73
+ 5. [Training Data](#training-data)
74
+ 6. [Evaluation](#evaluation)
75
+ 7. [Technical Specifications](#techincal-specifications)
76
+ 8. [Citation](#citation)
77
+ 9. [Glossary and Calculations](#glossary-and-calculations)
78
+ 10. [More Information](#more-information)
79
+ 11. [Model Card Authors](#model-card-authors)
80
+ 12. [Model Card Contact](#model-card-contact)
81
 
82
  ## Model Details
83
 
84
+ ### Model Description
85
  *This section provides information for anyone who wants to know about the model.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
 
87
+ - **Developed by:** BigScience ([website](https://bigscience.huggingface.co))
 
 
88
 
89
+ * All collaborators are either volunteers or have an agreement with their employer. *(Further breakdown of participants forthcoming.)*
90
 
91
+ - **Model Type:** Transformer-based Language Model
92
+ - **Version:** 1.0.0
93
+ - **Languages:** Multiple; see [training data](#training-data)
94
+ - **License:** RAIL License v1.0 ([link](https://huggingface.co/spaces/bigscience/license))
95
+ - **Release Date Estimate:** Monday, 11.July.2022
96
+ - **Funded by:**
97
 
98
+ * The French government.
 
 
99
 
100
+ * Hugging Face ([website](https://huggingface.co)).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
 
102
+ * Organizations of contributors. *(Further breakdown of organizations forthcoming.)*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
103
 
104
  ## Uses
105
 
106
  *This section addresses questions around how the model is intended to be used, discusses the foreseeable users of the model (including those affected by the model), and describes uses that are considered out of scope or misuse of the model.
107
  It provides information for anyone considering using the model or who is affected by the model.*
 
 
 
 
108
 
109
  ### Intended Use
110
 
 
190
  - People and groups exposed to outputs of, or decisions based on, the LLM
191
 
192
  - People and groups whose original work is included in the LLM
193
+
194
+
195
+
196
+ ## Bias, Risks, and Limitations
197
+ *This section identifies foreseeable harms and misunderstandings.*
198
 
199
+ Model may:
200
+
201
+ - Overrepresent some viewpoints and underrepresent others
202
+
203
+ - Contain stereotypes
204
+
205
+ - Contain [personal information](#personal-data-and-information)
206
+
207
+ - Generate:
208
+
209
+ - Hateful, abusive, or violent language
210
+
211
+ - Discriminatory or prejudicial language
212
+
213
+ - Content that may not be appropriate for all settings, including sexual content
214
+
215
+ - Make errors, including producing incorrect information as if it were factual
216
+
217
+ - Generate irrelevant or repetitive outputs
218
+
219
+
220
+ ### Recommendations
221
+
222
+
223
+ *This section provides information on warnings and potential mitigations.*
224
+
225
+ - Indirect users should be made aware when the content they're working with is created by the LLM.
226
+
227
+ - Users should be aware of [Risks and Limitations](#risks-and-limitations), and include an appropriate age disclaimer or blocking interface as necessary.
228
+
229
+ - Models pretrained with the LLM should include an updated Model Card.
230
+
231
+ - Users of the model should provide mechanisms for those affected to provide feedback, such as an email address for comments.
232
+
233
+
234
+
235
 
236
  ## Training Data
237
  *This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
238
 
239
 
240
+
 
241
 
242
  Details for each dataset are provided in individual [Data Cards](https://huggingface.co/spaces/bigscience/BigScienceCorpus).
243
 
 
257
  ![pie chart showing the distribution of languages in training data](https://github.com/bigscience-workshop/model_card/blob/main/assets/data/pie_chart.svg?raw=true)
258
 
259
 
260
+ **The following table shows the further distribution of Niger-Congo and Indic languages in the training data.**
261
+
 
262
 
263
  | Niger Congo | Percentage | | Indic | Percentage |
264
  |----------------|------------ |------ |-----------|------------|
 
284
  | Swahili | 0.02 |
285
  </details>
286
 
287
+ **The following table shows the distribution of programming languages.**
288
+
 
289
 
290
  | Extension | Language | Number of files |
291
  |----------------|------------|-----------------|
 
315
  | php5 | PHP | 166 |
316
  | php4 | PHP | 29 |
317
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
318
 
319
  ## Evaluation
320
  *This section describes the evaluation protocols and provides the results.*
321
 
 
 
322
 
323
  ### Metrics
324
  *This section describes the different ways performance is calculated and why.*
 
358
 
359
  - [BLOOM Book](https://huggingface.co/spaces/bigscience/bloom-book): Read generations from BLOOM based on prompts provided by the community
360
 
 
 
361
 
 
362
 
363
+ ## Environmental Impact
364
 
365
+ The training supercomputer, Jean Zay ([website](http://www.idris.fr/eng/jean-zay/jean-zay-presentation-eng.html)), uses mostly nuclear energy. The heat generated by it is reused for heating campus housing.
366
+
367
+ **Estimated carbon emissions:** *(Forthcoming upon completion of training.)*
368
+
369
+ **Estimated electricity usage:** *(Forthcoming upon completion of training.)*
370
 
 
 
371
 
 
372
 
373
+ ## Technical Specifications
374
+ *This section provides information for people who work on model development.*
375
 
 
376
 
377
+ Please see [the BLOOM training README](https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml#readme) for full details on replicating training.
378
 
379
+ **Model Architecture:** Modified from Megatron-LM GPT2 (see [paper](https://arxiv.org/abs/1909.08053), [BLOOM Megatron code](https://github.com/bigscience-workshop/Megatron-DeepSpeed)):
 
380
 
381
+ * Decoder-only architecture
382
 
383
+ * Layer normalization applied to word embeddings layer (`StableEmbedding`; see [code](https://github.com/facebookresearch/bitsandbytes), [paper](https://arxiv.org/pdf/2110.02861.pdf))
384
+
385
+ * ALiBI positional encodings (see [paper](https://arxiv.org/pdf/2108.12409.pdf)), with GeLU activation functions
386
+
387
+ * 1,722,408,960 parameters:
388
+
389
+ * 513,802,240 embedding parameters
390
 
391
+ * 24 layers, 16 attention heads
392
 
393
+ * Hidden layers are 2048-dimensional
394
 
395
+ * Sequence length of 2048 tokens used (see [BLOOM tokenizer](https://huggingface.co/bigscience/tokenizer), [tokenizer description](#tokenization))
396
+
397
+ **Objective Function:** Cross Entropy with mean reduction (see [API documentation](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html#torch.nn.CrossEntropyLoss)).
398
+
399
+ **Compute infrastructure:** Jean Zay Public Supercomputer, provided by the French government (see [announcement](https://www.enseignementsup-recherche.gouv.fr/fr/signature-du-marche-d-acquisition-de-l-un-des-supercalculateurs-les-plus-puissants-d-europe-46733)).
400
+
401
+ * Hardware: 64 V100 16/32GB GPUs (16 nodes):
402
+
403
+ * 4 GPUs per node
404
+
405
+ * 40 CPUs per task
406
+
407
+ * 1 task per node
408
+
409
+ * CPU: AMD
410
+
411
+ * CPU memory: 160GB per node
412
+
413
+ * GPU memory: 64GB or 128GB (depending on node availability during training) per node
414
+
415
+ * Inter-node connect: Omni-Path Architecture (OPA)
416
+
417
+ * NCCL-communications network: a fully dedicated subnet
418
+
419
+ * Disc IO network: shared network with other types of nodes
420
+
421
+ * Software:
422
+
423
+ * Megatron-DeepSpeed ([Github link](https://github.com/bigscience-workshop/Megatron-DeepSpeed))
424
+
425
+ * DeepSpeed ([Github link](https://github.com/microsoft/DeepSpeed))
426
+
427
+ * PyTorch (pytorch-1.11 w/ CUDA-11.5; see [Github link](https://github.com/pytorch/pytorch))
428
+
429
+ * apex ([Github link](https://github.com/NVIDIA/apex))
430
+
431
+ ### **Training**
432
+
433
+ - Checkpoint size:
434
+
435
+ - Fp16 weights: 2.6GB (# params * 2)
436
+
437
+ - Full checkpoint with optimizer states: --
438
+
439
+ - Training throughput: --
440
+
441
+ - Number of epochs: 1
442
+
443
+ - Dates:
444
+
445
+ - Start: 11th March, 2022 11:42am PST
446
+
447
+ - End: 20 May, 2022
448
+
449
+ - Server training location: Île-de-France, France
450
+
451
+ ### **Tokenization**
452
+
453
+ The BLOOM tokenizer ([link](https://huggingface.co/bigscience/tokenizer)) is a learned subword tokenizer trained using:
454
+
455
+ - A byte-level Byte Pair Encoding (BPE) algorithm
456
+
457
+ - A simple pre-tokenization rule, no normalization
458
+
459
+ - A vocabulary size of 250,680
460
+
461
+ It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language.
462
+
463
+
464
+
465
+ ## Citation
466
+
467
+ **Cite as:** BigScience, _BigScience Language Open-science Open-access Multilingual (BLOOM) Language Model_. International, May 2021-May 2022
468
+
469
+ ## Glossary and Calculations
470
+
471
+ *This section defines common terms and how metrics are calculated.*
472
 
473
  - <a name="loss">**Loss:**</a> A calculation of the difference between what the model has learned and what the data shows ("groundtruth"). The lower the loss, the better. The training process aims to minimize the loss.
474
 
 
486
 
487
  - <a name="deception">**Deception:**</a> Doing something to intentionally mislead individuals to believe something that is false, such as by creating deadbots or chatbots on social media posing as real people, or generating text documents without making consumers aware that the text is machine generated.
488
 
 
 
489
 
490
  ## More Information
491
 
 
 
492
 
493
  ### Dataset Creation
494
 
 
513
  ### Initial Results
514
 
515
  Initial prompting experiments using interim checkpoints: https://huggingface.co/spaces/bigscience/bloom-book
 
 
 
516
 
517
  ## Model Card Authors
518
  *Ordered roughly chronologically and by amount of time spent.*
519
 
520
  Margaret Mitchell, Giada Pistilli, Yacine Jernite, Ezinwanne Ozoani, Marissa Gerchick, Nazneen Rajani, Sasha Luccioni, Irene Solaiman, Maraim Masoud, Somaieh Nikpoor, Carlos Muñoz Ferrandis, Stas Bekman, Christopher Akiki, Danish Contractor, David Lansky, Angelina McMillan-Major, Tristan Thrush, Suzana Ilić, Gérard Dupont, Shayne Longpre, Manan Dey, Stella Biderman, Douwe Kiela, Emi Baylor, Teven Le Scao, Aaron Gokaslan, Julien Launay, Niklas Muennighoff
521
+
522
+ ## Model Card Contact
523
+
524
+ **Send Questions to:** bigscience-contact@googlegroups.com