Pretrain Data HuggingFaceTB/smollm-corpus Viewer • Updated Sep 6, 2024 • 237M • 14.3k • 323 HuggingFaceFW/fineweb-edu-classifier Text Classification • Updated Nov 17, 2024 • 273k • • 173 HuggingFaceFW/fineweb Viewer • Updated Jan 31 • 25B • 193k • 2.11k togethercomputer/RedPajama-Data-V2 Updated Nov 21, 2024 • 4.21k • 359