Expanding RL with Verifiable Rewards Across Diverse Domains Paper • 2503.23829 • Published 7 days ago • 17
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback Paper • 2503.22230 • Published 10 days ago • 43
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild Paper • 2503.18892 • Published 13 days ago • 28
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Paper • 2503.16219 • Published 17 days ago • 46
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published Mar 2 • 57
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20 • 190
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 151
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13 • 99
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper • 2412.17256 • Published Dec 23, 2024 • 48
Diving into Self-Evolving Training for Multimodal Reasoning Paper • 2412.17451 • Published Dec 23, 2024 • 44
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? Paper • 2408.13257 • Published Aug 23, 2024 • 26
Learning to Move Like Professional Counter-Strike Players Paper • 2408.13934 • Published Aug 25, 2024 • 23
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20, 2024 • 60
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16, 2024 • 100
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23, 2024 • 71