Croissant: A Metadata Format for ML-Ready Datasets Paper β’ 2403.19546 β’ Published Mar 28, 2024 β’ 1
Distributed Deep Learning in Open Collaborations Paper β’ 2106.10207 β’ Published Jun 18, 2021 β’ 2
Evaluate & Evaluation on the Hub: Better Best Practices for Data and Model Measurements Paper β’ 2210.01970 β’ Published Sep 30, 2022 β’ 11
Datasets: A Community Library for Natural Language Processing Paper β’ 2109.02846 β’ Published Sep 7, 2021 β’ 11
The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset Paper β’ 2303.03915 β’ Published Mar 7, 2023 β’ 6
AfroDigits: A Community-Driven Spoken Digit Dataset for African Languages Paper β’ 2303.12582 β’ Published Mar 22, 2023 β’ 20
HuggingFace's Transformers: State-of-the-art Natural Language Processing Paper β’ 1910.03771 β’ Published Oct 9, 2019 β’ 16