Model Bench Leaderboard Evaluating Models Running 4.62k 4.62k LMArena Leaderboard 🏆 Display LMArena Leaderboard Running on CPU Upgrade 6.43k 6.43k MTEB Leaderboard 🥇 Embedding Leaderboard Running 394 394 Reward Bench Leaderboard 📐 Display and analyze reward model evaluation results
Reasoning Datasets Datasets with reasoning traces across various domains released by the community. bespokelabs/Bespoke-Stratos-35k Viewer • Updated Jan 22 • 35k • 21 • 5 open-thoughts/OpenThoughts-114k Viewer • Updated 21 days ago • 228k • 35.1k • 755 open-r1/OpenThoughts-114k-math Viewer • Updated Jan 30 • 89.1k • 412 • 84 PrimeIntellect/NuminaMath-QwQ-CoT-5M Viewer • Updated Jan 22 • 5.14M • 760 • 51
Model Bench Leaderboard Evaluating Models Running 4.62k 4.62k LMArena Leaderboard 🏆 Display LMArena Leaderboard Running on CPU Upgrade 6.43k 6.43k MTEB Leaderboard 🥇 Embedding Leaderboard Running 394 394 Reward Bench Leaderboard 📐 Display and analyze reward model evaluation results
Reasoning Datasets Datasets with reasoning traces across various domains released by the community. bespokelabs/Bespoke-Stratos-35k Viewer • Updated Jan 22 • 35k • 21 • 5 open-thoughts/OpenThoughts-114k Viewer • Updated 21 days ago • 228k • 35.1k • 755 open-r1/OpenThoughts-114k-math Viewer • Updated Jan 30 • 89.1k • 412 • 84 PrimeIntellect/NuminaMath-QwQ-CoT-5M Viewer • Updated Jan 22 • 5.14M • 760 • 51