Kuldeep Singh Sidhu's picture
5 3

Kuldeep Singh Sidhu

singhsidhukuldeep

AI & ML interests

None yet

Organizations

Posts 40

view post
Post
2250
Hello, HuggingFace๐Ÿค— community ๐ŸŒŸ,

All the amazing people quantising LLMs to AWQ and GPTQ ๐Ÿ”ง๐Ÿค–

Can you please mention the perplexity you achieved ๐Ÿ“‰ OR any other metric to measure the quantisation qualitatively? ๐Ÿ“Š

The GGUF community follows this really well! ๐Ÿ‘

And if it is not too much to ask, the script used for quantisation would be amazing! ๐Ÿ“

Thanks for the quants for the GPU poor! ๐Ÿ’ป
view post
Post
585
When @MistralAI drops a blog post labelled "Large Enough," it's going to get serious! ๐Ÿš€๐Ÿ’ก

- Mistral-Large-Instruct-2407, just call it Mistral-Large2, is a 123B parameters Instruct model with 128k context ๐ŸŒ๐Ÿ“š

- Multilingual in 11 languages; English ๐Ÿ‡ฌ๐Ÿ‡ง, French ๐Ÿ‡ซ๐Ÿ‡ท, German ๐Ÿ‡ฉ๐Ÿ‡ช, Spanish ๐Ÿ‡ช๐Ÿ‡ธ, Italian ๐Ÿ‡ฎ๐Ÿ‡น, Chinese ๐Ÿ‡จ๐Ÿ‡ณ, Japanese ๐Ÿ‡ฏ๐Ÿ‡ต, Korean ๐Ÿ‡ฐ๐Ÿ‡ท, Portuguese ๐Ÿ‡ต๐Ÿ‡น, Dutch ๐Ÿ‡ณ๐Ÿ‡ฑ, and Polish ๐Ÿ‡ต๐Ÿ‡ฑ. ๐Ÿ—ฃ๏ธ๐Ÿ—บ๏ธ

- Also highly focused on programming, trained on 80+ coding languages such as Python, Java, C, C++, Javascript, bash ๐Ÿ’ป๐Ÿ”ง

- Supports native function calling and structured output. ๐Ÿ› ๏ธ๐Ÿ“Š

- Released under Mistral Research License (Non-Commercial License, Research only๐Ÿ˜”)

- Open weights only๐Ÿ”“, no data or code released ๐Ÿ”’๐Ÿ“

Definitely firing shots at @Meta Llama3.1: ๐ŸŽฏ๐Ÿ”ฅ
MMLU - 84.0% (ML2) vs 79.3% (L3.1-70B) vs 85.2% (L3.1-405B)
GSM8K - 93% (ML2) vs 95.5% (L3.1-70B-Ins) vs 96.8% (L3.1-405B-Ins)

Also, it's kinda chunky! ๐Ÿ“ฆ๐Ÿ’ช
fp16/ bf16 - ~250GB VRAM
fp8/ int8 - ~125GB VRAM
int4 - ~60GB VRAM

I tried quantising it to AWQ and GPTQ, but couldn't with 30GB V-RAM. โŒ๐Ÿ–ฅ๏ธ

Also calling out AWQ and GPTQ on not supporting multi-GPU quantisation! ๐Ÿ–ฅ๏ธโšก

God sent @casperhansen has posted AWQ quantised INT4 model (68.68 GB) with the perplexity of 2.889: casperhansen/mistral-large-instruct-2407-awq ๐Ÿ”ฅ๐Ÿ‘

Looks like open AI is going to beat OpenAI! ๐Ÿ†๐Ÿค–

Blog post: https://mistral.ai/news/mistral-large-2407/

Models: mistralai/Mistral-Large-Instruct-2407

models

None public yet

datasets

None public yet