Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding Paper • 2605.29707 • Published 12 days ago • 142
LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents Paper • 2605.29559 • Published 12 days ago • 16
MolmoAct2: Action Reasoning Models for Real-world Deployment Paper • 2605.02881 • Published May 4 • 348
GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment Paper • 2605.19577 • Published 21 days ago • 58
Draft Less, Retrieve More: Hybrid Tree Construction for Speculative Decoding Paper • 2605.20104 • Published 21 days ago • 7
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling Paper • 2605.13301 • Published 27 days ago • 159
RigidFormer: Learning Rigid Dynamics using Transformers Paper • 2605.09196 • Published about 1 month ago • 14
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? Paper • 2603.24472 • Published Mar 25 • 57
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation Paper • 2603.19220 • Published Mar 19 • 69
CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs Paper • 2602.03048 • Published Feb 3 • 32
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper • 2602.01058 • Published Feb 1 • 45