SpiridonSunRotator's picture
Added fp16 baseline
509ed2d verified
|
raw
history blame
No virus
665 Bytes

Official AQLM quantization of google/gemma-2b.

For this quantization, we used 1 codebook of 16 bits.

Results (0-shot acc):

Model Quantization WinoGrande PiQA HellaSwag ArcE ArcC Model size, Gb
gemma-2b None 0.6472 0.7715 0.5279 0.7403 0.4053 5.0
1x16 0.6275 0.7318 0.4582 0.6923 0.3259 1.7

To learn more about the inference, as well as the information on how to quantize models yourself, please refer to the official GitHub repo.