CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives Paper • 2504.10823 • Published 7 days ago • 10
MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges? Paper • 2504.09702 • Published 8 days ago • 16