Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published Feb 10 • 149
Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness Paper • 2309.16973 • Published Sep 29, 2023