Training dataset shuffling/mixing

#180
by wish - opened

To make the question a bit simpler assume BLOOM was trained on only Wikipedia and GitHub, how were the two datasets combined for training? Is it a simple concatenation so that during training the model will first be trained on Wikipedia documents and later GitHub, or were the datasets mixed in some way? I could not find this in the paper, sorry if it was already answered or if I overlooked the info in the paper.

Sign up or log in to comment