Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Paper • 2503.16419 • Published 13 days ago • 65
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published 15 days ago • 112
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment Paper • 2408.06266 • Published Aug 12, 2024 • 10