Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
reach-vbΒ 
posted an update Jul 16
Post
3347
What an eventful day in Open Source LLMs today:

Mistral released Codestral Mamba 🐍
> Beats DeepSeek QwenCode, best model < 10B, competitive with Codestral 22B
> Mamba 2 architecture - supports up to 256K context
> Apache 2.0 licensed, perfect for local code assistant
> Transformers & llama.cpp integration upcoming!

Model checkpoint: https://huggingface.co/mistralai/mamba-codestral-7B-v0.1

Hugging Face dropped SmolLM 🀏
> Beats MobileLLM, Qwen 0.5B, Phi 1.5B and more!
> 135M, 360M, and 1.7B param model checkpoints
> Trained on 600B high-quality synthetic + FineWeb Edu tokens
> Architecture: Llama + GQA + 2048 ctx length
> Ripe for fine-tuning and on-device deployments.
> Works out of the box with Transformers!

Model checkpoints: HuggingFaceTB/smollm-6695016cad7167254ce15966

Mistral released Mathstral 7B βˆ‘
> 56.6% on MATH and 63.47% on MMLU
> Same architecture as Mistral 7B
> Works out of the box with Transformers & llama.cpp
> Released under Apache 2.0 license

Model checkpoint: https://huggingface.co/mistralai/mathstral-7B-v0.1

Pretty dope day for open source ML. Can't wait to see what the community builds with it and to support them further! πŸ€—

What's your favourite from the release today?

Thanks for the update!