turboderp
/

Smaug-72B-exl2

Model card Files Files and versions Community

Qwen 1.5 series pls

by Pevernow - opened Feb 17

Feb 17

@bartowski Thanks @turboderp for the great work. Now exllamav2 supports the Qwen1.5 model. Please help me quantify it, thank you. In addition, considering that Qwen1.5 uses a lot of Chinese and English training. When quantizing, using a mixture of Chinese and English calibration data sets may reduce the loss of the model's multilingual capabilities during the quantification process.

It is worth noting that Qwen overemphasizes model safety during the alignment process, which may cause the model to incorrectly refuse to answer some normal questions.

Here is a recommended Chinese dataset that can alleviate the problems caused by over-alignment, and it is recommended that you use it for calibration during quantification. https://huggingface.co/datasets/tastypear/unalignment-toxic-dpo-v0.2-zh_cn/tree/main

turboderp

Owner Feb 17

I've successfully quantized Qwen1.5-72B-chat, so I don't think there should be any issues converting more Qwen models. I will do some more versions of the 72B chat model and upload at some point, I believe.

As for calibration data, the default calibration dataset contains a fair amount of Chinese (along with many other languages), so it should be well suited for multilingual models like Qwen.

bartowski

Feb 17

@Pevernow 7b up, 14b in the works

https://huggingface.co/bartowski/Qwen1.5-7B-Chat-exl2

Pevernow

Feb 19

@bartowski @turboderp Thanks a lot.

Unfortunately, the Qwen1.5 model is not friendly to consumer-grade PCs due to the lack of GQA support and the large memory usage.

Also, would you be interested in making a quantized version of exl2 for https://huggingface.co/CausalLM/14B-DPO-alpha?

I noticed that this was the model from December last year, but I couldn't find the relevant exl2 quantization model. The only relevant one is the non-DPO version, which is of poorer quality.
Can you please quantify one? Thank you.

bartowski

Feb 19

yeah the lack of GQA is rough.

I'll make that 14B DPO alpha once my current one is done

bartowski

Feb 19

@Pevernow seems that model is missing the tokenizer config :(

Pevernow

Feb 21

@Pevernow seems that model is missing the tokenizer config :(

Isn't tokenizer_config.json the file you need?

bartowski

Feb 21

no i think it needs tokenizer.model or tokenizer.json

Pevernow

Feb 22

•

edited Feb 22

According to the official introduction, this model is fully compatible with Llama2 and also uses the Llama2 architecture. Tokenizer maybe GPT2Tokenizer, omitted because it is very common?

https://huggingface.co/cgus/CausalLM-14B-exl2/tree/main

In addition, I found the non-DPO version of exl quantization in this link, with a tokenizer.model. Although the original page does not provide this file.

There is another possible guess, maybe if you run it with transformers, the dependency file will be automatically generated or downloaded?

Pevernow

Feb 25

no i think it needs tokenizer.model or tokenizer.json

@bartowski Any updates?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment