README.md · ISTA-DASLab/Mixtral-8x7b-AQLM-2Bit-1x16-hf at 8ebd77b21f8825f992eb7ed9c00fec3b9bb4d0aa

Official AQLM quantization of mistralai/Mixtral-8x7B-v0.1.

For this quantization, we used 1 codebook of 16 bits.

Selected evaluation results for this and other models:

Model	AQLM scheme	WikiText 2 PPL	Model size, Gb	Hub link
Llama-2-7b†	1x16	5.92	2.4	Link
Llama-2-7b†	2x8	6.69	2.2	Link
Llama-2-7b†	8x8	6.61	2.2	Link
Llama-2-13b	1x16	5.41	4.1	Link
Llama-2-70b	1x16	3.96	18.8	Link
Llama-2-70b	2x8	4.83	18.2	Link
Mixtral-8x7b (THIS)	1x16	3.35	12.6	Link
Mixtral-8x7b-Instruct	1x16	-	12.6	Link

To learn more about the inference, as well as the information on how to quantize models yourself, please refer to the official GitHub repo.