DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published 16 days ago • 112
RWKV-7 "Goose" with Expressive Dynamic State Evolution Paper • 2503.14456 • Published 16 days ago • 130
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published 22 days ago • 67
EuroBERT: Scaling Multilingual Encoders for European Languages Paper • 2503.05500 • Published 27 days ago • 74
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published 28 days ago • 88
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published about 1 month ago • 80
Predictive Data Selection: The Data That Predicts Is the Data That Teaches Paper • 2503.00808 • Published Mar 2 • 56
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20 • 188
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models Paper • 2502.09604 • Published Feb 13 • 34
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published Feb 13 • 147
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 132
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 216
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28 • 118
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published Jan 29 • 57