Mistral Small 3 is SUPER fast, and highest score for 20+b model, but still 11 points below Qwen 2.5 coder 32b.
I believe specialty model is the future. The more you know what to do with the model, the better bang you can get for your buck. If Mistral scopes this small model to coding only, I'm confident they can beat Qwen.
One day my leaderboard will be dominated by smol models excellent on one thing, not monolithic ones costing $$$. And I'm looking forward to that.
Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after:
- Original release: 8 models, 540K downloads. Just the beginning...
- The community turned those open-weight models into +550 NEW models on Hugging Face. Total downloads? 2.5Mβnearly 5X the originals.
The reason? DeepSeek models are open-weight, letting anyone build on top of them. Interesting to note that the community focused on quantized versions for better efficiency & accessibility. They want models that use less memory, run faster, and are more energy-efficient.
When you empower builders, innovation explodes. For everyone. π
The most popular community model? @bartowski's DeepSeek-R1-Distill-Qwen-32B-GGUF version β 1M downloads alone.