The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset Paper β’ 2303.03915 β’ Published Mar 7, 2023 β’ 6
BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling Paper β’ 2207.06814 β’ Published Jul 14, 2022 β’ 1
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper β’ 2211.05100 β’ Published Nov 9, 2022 β’ 28