Introducing AutoRound INT4 Algorithm
#12
by
wenhuach
- opened
Hello, first and foremost, I would like to express my gratitude for your exceptional work and for sharing your model with the community.
We have recently applied one of the algorithms in Intel Neural Compressor ,i.e. AutoRound, to your model, achieving remarkable results . Below are the zero shot accuracies, all tested with real quantized models in the same environment , batch_size 16.
Model: Qwen1.5-7b-Chat | CEVAL | CMMLU | MMLU | Gsm8k | Average |
---|---|---|---|---|---|
BF16 | 0.6887 | 0.6959 | 0.6020 | 0.5057 | 0.6231 |
GPTQ-INT4, sym | 0.6679 | 0.6831 | 0.5902 | 0.4867 | 0.6070 |
AutoRound-INT4, sym | 0.6761 | 0.6870 | 0.5974 | 0.5216 | 0.6205 |
Unfortunately, we are unable to upload the quantized model due to licensing constraints. Therefore, we would appreciate it if you could generate it yourself by following the recipe links, and we are here to provide assistance. Our model is calibrated by 512 samples from pile-10k, an English dataset.