Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Nov 28 • 449
No More Adam: Learning Rate Scaling at Initialization is All You Need Paper • 2412.11768 • Published 14 days ago • 41
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 12 days ago • 112
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated 11 days ago • 104
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published 17 days ago • 78
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain Paper • 2412.13018 • Published 13 days ago • 40
Falcon3 Collection Falcon3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters. • 40 items • Updated 11 days ago • 74