AWQ Quantized version (for use with vllm etc)

#4
by diwank - opened

We quantized dolphin-2.9-llama3-70b using autoawq (version 0.2.5) and uploaded to hf, in case anyone finds it useful:

https://huggingface.co/julep-ai/dolphin-2.9-llama3-70b-awq

This comment has been hidden
Cognitive Computations org
edited May 4

No, @Kearm ,actually they've supported it for a long time. As a matter of fact, llama3 is using the same format as llama2 (absolutely nothing needed to be done to support it). We are not slow to support new models, just that often we don't see the need to support some model types (like qwen), and the changing of models is super annoying (there is a example issue in AutoAWQ, when you can add support for gpt2 to see the process).

Let's not promote stigmas like this. I just don't quant the 70b as it take some dedicated effort.

Well done @diwank and thank-you for doing this.

No, @Kearm ,actually they've supported it for a long time. As a matter of fact, llama3 is using the same format as llama2 (absolutely nothing needed to be done to support it). We are not slow to support new models, just that often we don't see the need to support some model types (like qwen), and the changing of models is super annoying (there is a example issue in AutoAWQ, when you can add support for gpt2 to see the process).

Let's not promote stigmas like this. I just don't quant the 70b as it take some dedicated effort.

Well done @diwank and thank-you for doing this.

@Suparious

I entirely admit my mistake and acknowledge I was spreading misinformation. I had limited knowledge of the AutoAWQ project and that was my fault for a spurious coment. I have hidden the post as to not spread the information but for transparency sake I commented they seemed slow to support new models which is false. There is enough negativity in HF comments already and I apoligize to casper-hansen for my misinformed comment.

Sign up or log in to comment