BLOOM: A 176B-Parameter Open-Access Multilingual Language Model Paper • 2211.05100 • Published Nov 9, 2022 • 28
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset Paper • 2303.03915 • Published Mar 7, 2023 • 6
BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling Paper • 2207.06814 • Published Jul 14, 2022