Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving Paper • 2504.02605 • Published 19 days ago • 44
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Paper • 2503.16419 • Published Mar 20 • 70
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published Mar 18 • 120
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs Paper • 2503.11751 • Published Mar 14 • 16
🧠Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 21 items • Updated 7 days ago • 128
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models Paper • 2502.07346 • Published Feb 11 • 53