galqiwi's picture
Update README.md
6d11ec3 verified
metadata
library_name: transformers
tags:
  - aqlm
  - llama
  - facebook
  - meta
  - llama-3
  - conversational
  - text-generation-inference
base_model: meta-llama/Meta-Llama-3.1-70B

Official AQLM quantization of meta-llama/Meta-Llama-3.1-70B finetuned with PV-Tuning.

For this quantization, we used 1 codebook of 16 bits and groupsize of 8.

Results:

Model Quantization MMLU (5-shot) ArcC ArcE Hellaswag PiQA Winogrande Model size, Gb
fp16 0.7839 0.6058 0.8729 0.6650 0.8292 0.7964 141
1x16g8 0.7353 0.4556 0.7731 0.6335 0.7590 0.7703 21.9