Official AQLM quantization of meta-llama/Meta-Llama-3.1-70B finetuned with PV-Tuning.
For this quantization, we used 1 codebook of 16 bits and groupsize of 8.
Results:
Model | Quantization | MMLU (5-shot) | ArcC | ArcE | Hellaswag | PiQA | Winogrande | Model size, Gb |
---|---|---|---|---|---|---|---|---|
fp16 | 0.7839 | 0.6058 | 0.8729 | 0.6650 | 0.8292 | 0.7964 | 141 | |
1x16g8 | 0.7353 | 0.4556 | 0.7731 | 0.6335 | 0.7590 | 0.7703 | 21.9 |
- Downloads last month
- 85
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for ISTA-DASLab/Meta-Llama-3.1-70B-AQLM-PV-2Bit-1x16
Base model
meta-llama/Llama-3.1-70B