@singhsidhukuldeep on Hugging Face: "When @MistralAI drops a blog post labelled "Large Enough," it's going to get…"

Post

380

When @MistralAI drops a blog post labelled "Large Enough," it's going to get serious! 🚀💡

- Mistral-Large-Instruct-2407, just call it Mistral-Large2, is a 123B parameters Instruct model with 128k context 🌍📚

- Multilingual in 11 languages; English 🇬🇧, French 🇫🇷, German 🇩🇪, Spanish 🇪🇸, Italian 🇮🇹, Chinese 🇨🇳, Japanese 🇯🇵, Korean 🇰🇷, Portuguese 🇵🇹, Dutch 🇳🇱, and Polish 🇵🇱. 🗣️🗺️

- Also highly focused on programming, trained on 80+ coding languages such as Python, Java, C, C++, Javascript, bash 💻🔧

- Supports native function calling and structured output. 🛠️📊

- Released under Mistral Research License (Non-Commercial License, Research only😔)

- Open weights only🔓, no data or code released 🔒📁

Definitely firing shots at @Meta Llama3.1: 🎯🔥
MMLU - 84.0% (ML2) vs 79.3% (L3.1-70B) vs 85.2% (L3.1-405B)
GSM8K - 93% (ML2) vs 95.5% (L3.1-70B-Ins) vs 96.8% (L3.1-405B-Ins)

Also, it's kinda chunky! 📦💪
fp16/ bf16 - ~250GB VRAM
fp8/ int8 - ~125GB VRAM
int4 - ~60GB VRAM

I tried quantising it to AWQ and GPTQ, but couldn't with 30GB V-RAM. ❌🖥️

Also calling out AWQ and GPTQ on not supporting multi-GPU quantisation! 🖥️⚡

God sent @casperhansen has posted AWQ quantised INT4 model (68.68 GB) with the perplexity of 2.889: casperhansen/mistral-large-instruct-2407-awq 🔥👏

Looks like open AI is going to beat OpenAI! 🏆🤖

Blog post: https://mistral.ai/news/mistral-large-2407/

Models: mistralai/Mistral-Large-Instruct-2407

Join the conversation