Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models Paper • 2502.15499 • Published 9 days ago • 12
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam Paper • 2502.17055 • Published 6 days ago • 14
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO Paper • 2502.14669 • Published 10 days ago • 11
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities? Paper • 2502.12215 • Published 13 days ago • 16
You Do Not Fully Utilize Transformer's Representation Capacity Paper • 2502.09245 • Published 17 days ago • 33
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity Paper • 2502.13063 • Published 12 days ago • 63
REALTALK: A 21-Day Real-World Dataset for Long-Term Conversation Paper • 2502.13270 • Published 12 days ago • 6
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization Paper • 2502.13922 • Published 11 days ago • 25
Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering Paper • 2502.13962 • Published 11 days ago • 27
MoM: Linear Sequence Modeling with Mixture-of-Memories Paper • 2502.13685 • Published 11 days ago • 31
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning Paper • 2502.12853 • Published 12 days ago • 27
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning Paper • 2502.14768 • Published 10 days ago • 43