Running 2.5k 2.5k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated 11 days ago • 61
nvidia/Llama-Nemotron-Post-Training-Dataset Viewer • Updated 7 days ago • 3.91M • 7.19k • 423
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 143
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 103
Tulu 3 Datasets Collection All datasets released with Tulu 3 -- state of the art open post-training recipes. • 33 items • Updated Mar 13 • 78
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens Paper • 2410.13863 • Published Oct 17, 2024 • 38
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF Text Generation • Updated 10 days ago • 28.2k • • 2.03k
Law of the Weakest Link: Cross Capabilities of Large Language Models Paper • 2409.19951 • Published Sep 30, 2024 • 55