Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents Paper • 2504.00906 • Published 2 days ago • 18
Expanding RL with Verifiable Rewards Across Diverse Domains Paper • 2503.23829 • Published 3 days ago • 16
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub Feb 12 • 55
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization Paper • 2503.12937 • Published 17 days ago • 27
Towards Self-Improving Systematic Cognition for Next-Generation Foundation MLLMs Paper • 2503.12303 • Published 19 days ago • 7
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks Paper • 2503.15478 • Published 15 days ago • 9
STEVE: AStep Verification Pipeline for Computer-use Agent Training Paper • 2503.12532 • Published 18 days ago • 14
φ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation Paper • 2503.13288 • Published 17 days ago • 48
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't Paper • 2503.16219 • Published 14 days ago • 46
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Paper • 2503.16419 • Published 14 days ago • 65
Modifying Large Language Model Post-Training for Diverse Creative Writing Paper • 2503.17126 • Published 13 days ago • 33
Feather-SQL: A Lightweight NL2SQL Framework with Dual-Model Collaboration Paradigm for Small Language Models Paper • 2503.17811 • Published 12 days ago • 13