Running Agents 432 Reward Bench Leaderboard 📐 432 Explore and compare model scores on RewardBench benchmarks
RewardBench: Evaluating Reward Models for Language Modeling Paper • 2403.13787 • Published Mar 20, 2024 • 22