view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM 2 days ago • 225
Running 2.24k 2.24k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 8 items • Updated 18 days ago • 396
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 203
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 276
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8 • 263
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper • 2408.03314 • Published Aug 6, 2024 • 57
Running 535 535 Scaling test-time compute 📈 Enhance math problem solving by scaling test-time compute
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated Dec 19, 2024 • 141
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations Paper • 2412.07626 • Published Dec 10, 2024 • 22
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding Paper • 2410.17434 • Published Oct 22, 2024 • 28
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published Dec 4, 2024 • 129