Qwen/Qwen1.5-7B-Chat · Introducing AutoRound INT4 Algorithm

Hello, first and foremost, I would like to express my gratitude for your exceptional work and for sharing your model with the community.

We have recently applied one of the algorithms in Intel Neural Compressor ,i.e. AutoRound, to your model, achieving remarkable results . Below are the zero shot accuracies, all tested with real quantized models in the same environment , batch_size 16.

Model: Qwen1.5-7b-Chat	CEVAL	CMMLU	MMLU	Gsm8k	Average
BF16	0.6887	0.6959	0.6020	0.5057	0.6231
GPTQ-INT4, sym	0.6679	0.6831	0.5902	0.4867	0.6070
AutoRound-INT4, sym	0.6761	0.6870	0.5974	0.5216	0.6205

Unfortunately, we are unable to upload the quantized model due to licensing constraints. Therefore, we would appreciate it if you could generate it yourself by following the recipe links, and we are here to provide assistance. Our model is calibrated by 512 samples from pile-10k, an English dataset.