🧠Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 21 items • Updated 8 days ago • 129
Scaling Pre-training to One Hundred Billion Data for Vision Language Models Paper • 2502.07617 • Published Feb 11 • 29
MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion Paper • 2502.04235 • Published Feb 6 • 22