Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper • 2412.13663 • Published 4 days ago • 90
Hymba Collection A series of Hybrid Small Language Models. • 2 items • Updated about 1 month ago • 24
Tulu 3 Models Collection All models released with Tulu 3 -- state of the art open post-training recipes. • 7 items • Updated 24 days ago • 29
On the Power of Decision Trees in Auto-Regressive Language Modeling Paper • 2409.19150 • Published Sep 27 • 4
AutoTrain: No-code training for state-of-the-art models Paper • 2410.15735 • Published Oct 21 • 58
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation Paper • 2409.06820 • Published Sep 10 • 63
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 123
LLM Pruning and Distillation in Practice: The Minitron Approach Paper • 2408.11796 • Published Aug 21 • 57
To Code, or Not To Code? Exploring Impact of Code in Pre-training Paper • 2408.10914 • Published Aug 20 • 41
The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design Paper • 2408.12503 • Published Aug 22 • 23
ShieldGemma: Generative AI Content Moderation Based on Gemma Paper • 2407.21772 • Published Jul 31 • 14
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23 • 68
Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle Paper • 2407.13833 • Published Jul 18 • 11