Training dataset shuffling/mixing

#180

by wish - opened Jan 31, 2023

wish

Jan 31, 2023

To make the question a bit simpler assume BLOOM was trained on only Wikipedia and GitHub, how were the two datasets combined for training? Is it a simple concatenation so that during training the model will first be trained on Wikipedia documents and later GitHub, or were the datasets mixed in some way? I could not find this in the paper, sorry if it was already answered or if I overlooked the info in the paper.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment