Model Bench Leaderboard Evaluating Models Running 4.92k Arena Leaderboard 🏆 4.92k View the LMArena leaderboard in full‑screen Running on CPU Upgrade 7.48k MTEB Leaderboard 📊 7.48k Embedding Leaderboard Running Agents 432 Reward Bench Leaderboard 📐 432 Explore and compare model scores on RewardBench benchmarks
Running Agents 432 Reward Bench Leaderboard 📐 432 Explore and compare model scores on RewardBench benchmarks
Reasoning Datasets Datasets with reasoning traces across various domains released by the community. bespokelabs/Bespoke-Stratos-35k Viewer • Updated Jan 22, 2025 • 35k • 87 • 5 open-thoughts/OpenThoughts-114k Viewer • Updated Aug 31, 2025 • 228k • 92.2k • 863 open-r1/OpenThoughts-114k-math Viewer • Updated Jan 30, 2025 • 89.1k • 937 • 94 PrimeIntellect/NuminaMath-QwQ-CoT-5M Viewer • Updated Jan 22, 2025 • 5.14M • 542 • 62
Model Bench Leaderboard Evaluating Models Running 4.92k Arena Leaderboard 🏆 4.92k View the LMArena leaderboard in full‑screen Running on CPU Upgrade 7.48k MTEB Leaderboard 📊 7.48k Embedding Leaderboard Running Agents 432 Reward Bench Leaderboard 📐 432 Explore and compare model scores on RewardBench benchmarks
Running Agents 432 Reward Bench Leaderboard 📐 432 Explore and compare model scores on RewardBench benchmarks
Reasoning Datasets Datasets with reasoning traces across various domains released by the community. bespokelabs/Bespoke-Stratos-35k Viewer • Updated Jan 22, 2025 • 35k • 87 • 5 open-thoughts/OpenThoughts-114k Viewer • Updated Aug 31, 2025 • 228k • 92.2k • 863 open-r1/OpenThoughts-114k-math Viewer • Updated Jan 30, 2025 • 89.1k • 937 • 94 PrimeIntellect/NuminaMath-QwQ-CoT-5M Viewer • Updated Jan 22, 2025 • 5.14M • 542 • 62