Towards General-Purpose Model-Free Reinforcement Learning Paper • 2501.16142 • Published 11 days ago • 24
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published 10 days ago • 100
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization Paper • 2412.12098 • Published Dec 16, 2024 • 4
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning Paper • 2412.09858 • Published Dec 13, 2024 • 1
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs Paper • 2501.18585 • Published 8 days ago • 51
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles Paper • 2502.01081 • Published 4 days ago • 9
Improving Transformer World Models for Data-Efficient RL Paper • 2502.01591 • Published 4 days ago • 8
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search Paper • 2502.02508 • Published 3 days ago • 16
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking Paper • 2502.02339 • Published 3 days ago • 11
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods Paper • 2502.01618 • Published 4 days ago • 5