ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated 6 days ago • 91
Granite 3.1 Language Models Collection A series of language models with 128K context length trained by IBM licensed under Apache 2.0 license. • 8 items • Updated 7 days ago • 30
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated 19 days ago • 548
Gemma Scope Release Collection A comprehensive, open suite of sparse autoencoders for Gemma 2 2B and 9B. • 10 items • Updated 12 days ago • 13
Llama 3.1 Evals Collection This collection provides detailed information on how we derived the reported benchmark metrics for the Llama 3.1 models, including the configurations, • 6 items • Updated 19 days ago • 16
Minitron Collection A family of compressed models obtained via pruning and knowledge distillation • 12 items • Updated 13 days ago • 59
🪐 SmolLM Collection A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated 3 days ago • 204
MUSCLE: A Model Update Strategy for Compatible LLM Evolution Paper • 2407.09435 • Published Jul 12 • 20
view article Article BM25 for Python: Achieving high performance while simplifying dependencies with *BM25S*⚡ By xhluca • Jul 9 • 41
AgentInstruct: Toward Generative Teaching with Agentic Flows Paper • 2407.03502 • Published Jul 3 • 49
Step-DPO Collection Resources for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs" • 11 items • Updated Jul 1 • 5
SpeechVerse: A Large-scale Generalizable Audio Language Model Paper • 2405.08295 • Published May 14 • 14
TaskMeAnything Collection A collection of TaskMeAnything resources [https://github.com/JieyuZ2/TaskMeAnything] • 12 items • Updated Aug 4 • 3
view article Article Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models Jun 24 • 180
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering Paper • 2406.10208 • Published Jun 14 • 21