How to Get Your LLM to Generate Challenging Problems for Evaluation Paper • 2502.14678 • Published 5 days ago • 14
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published 5 days ago • 91
Diverse Inference and Verification for Advanced Reasoning Paper • 2502.09955 • Published 11 days ago • 16
DarwinLM: Evolutionary Structured Pruning of Large Language Models Paper • 2502.07780 • Published 14 days ago • 17 • 7
DarwinLM: Evolutionary Structured Pruning of Large Language Models Paper • 2502.07780 • Published 14 days ago • 17
Expect the Unexpected: FailSafe Long Context QA for Finance Paper • 2502.06329 • Published 15 days ago • 124
An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging Paper • 2502.09056 • Published 12 days ago • 30