ISTA-DASLab
/

Llama-2-13b-AQLM-2Bit-1x16-hf

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama-2-13b-AQLM-2Bit-1x16-hf / README.md

BlackSamorez's picture

Update README.md

dad6104 verified 5 months ago

|

No virus

1.8 kB

	Official [AQLM](https://arxiv.org/abs/2401.06118) quantization of `meta-llama/Llama-2-13b-hf`.

	For this quantization, we used 1 codebook of 16 bits.

	Selected evaluation results for this and other models:

	\| Model \| AQLM scheme \| WikiText 2 PPL \| Model size, Gb \| Hub link \|
	\|------------\|-------------\|----------------\|----------------\|--------------------------------------------------------------------------\|
	\| Llama-2-7b† \| 1x16 \| 5.92 \| 2.4 \| [Link](https://huggingface.co/BlackSamorez/Llama-2-7b-AQLM-2Bit-1x16-hf) \|
	\| Llama-2-7b† \| 2x8 \| 6.69 \| 2.2 \| [Link](https://huggingface.co/BlackSamorez/Llama-2-7b-AQLM-2Bit-2x8-hf) \|
	\| Llama-2-7b† \| 8x8 \| 6.61 \| 2.2 \| [Link](https://huggingface.co/BlackSamorez/Llama-2-7b-AQLM-2Bit-8x8-hf) \|
	\| Llama-2-13b (THIS)\| 1x16 \| 5.41 \| 4.1 \| [Link](https://huggingface.co/BlackSamorez/Llama-2-13b-AQLM-2Bit-1x16-hf)\|
	\| Llama-2-70b\| 1x16 \| 3.96 \| 18.8 \| [Link](https://huggingface.co/BlackSamorez/Llama-2-70b-AQLM-2Bit-1x16-hf)\|
	\| Llama-2-70b\| 2x8 \| 4.83 \| 18.2 \| [Link](https://huggingface.co/BlackSamorez/Llama-2-70b-AQLM-2Bit-2x8-hf) \|
	\| Mixtral-8x7b\| 1x16 \| 4.37 \| 12.6 \| [Link](https://huggingface.co/BlackSamorez/Mixtral-8x7b-AQLM-2Bit-1x16-hf)\|
	\| Mixtral-8x7b-Instruct\| 1x16 \| - \| 12.6 \| [Link](https://huggingface.co/BlackSamorez/Mixtral-8x7B-Instruct-v0_1-AQLM-2Bit-1x16-hf)\|

	To learn more about the inference, as well as the information on how to quantize models yourself, please refer to the [official GitHub repo](https://github.com/Vahe1994/AQLM).