Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining Paper • 2503.04715 • Published Mar 6 • 3
LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs -- No Silver Bullet for LC or RAG Routing Paper • 2502.09977 • Published Feb 14 • 1
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? Paper • 2502.12115 • Published Feb 17 • 46
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 161