EuroBERT: Scaling Multilingual Encoders for European Languages Paper • 2503.05500 • Published 4 days ago • 66
view article Article Introducing EuroBERT: A High-Performance Multilingual Encoder Model By EuroBERT and 3 others • 1 day ago • 101
view article Article HuggingFace, IISc partner to supercharge model building on India's diverse languages 13 days ago • 14
rank1 Collection rank1 is the first test-time compute reasoning model in IR • 15 items • Updated 12 days ago • 3
OWLS: Scaling Laws for Speech Recognition and Translation Collection 🦉 A suite of Whisper-style models from 250M to 18B parameters. Trained on up to 360K hours of data. 16k sampling rate. • 7 items • Updated 1 day ago • 4
Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models Paper • 2502.15964 • Published 18 days ago • 1
"Actionable Help" in Crises: A Novel Dataset and Resource-Efficient Models for Identifying Request and Offer Social Media Posts Paper • 2502.16839 • Published 16 days ago • 1
Slam Collection All resources for SpeechLMs from "Slamming: Training a Speech Language Model on One GPU in a Day". We provide tokeniser, lm, and datasets • 6 items • Updated 15 days ago • 13
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models Paper • 2502.17387 • Published 15 days ago • 5
KB-Whisper Collection Whisper models trained on over 50,000 hours of Swedish speech data. • 5 items • Updated 25 days ago • 5
Open Image Preferences Collection Containing all artifacts for the Stable Diffusion 3.5L vs Flux Dev image preference community sprint. • 14 items • Updated Dec 19, 2024 • 9