EfficientQAT(w/o E2E-FT)
Collection
This collection provides quantized checkpoints of
•
28 items
•
Updated
EfficientQAT involves two consecutive training phases: Block-wise training of all parameters (Block-AP) and end-to-end training of quantization parameters (E2E-QP).
In this repo, we provide the quantized checkpoints of Block-AP. Anyone can use them to reproduce our results or carry following research.
Model | Quantization | WikiText2 PPL | Avg. Accuracy | Model Size (GB) | Hub link |
---|---|---|---|---|---|
Llama-2-7B | fp16 | 5.47 | 64.86 | 13.2 | - |
Llama-2-7B | w4g128 | 5.56 | 64.07 | 3.7 | Link |
Llama-2-7B | w3g128 | 5.89 | 63.96 | 3.1 | Link |
Llama-2-7B | w2g64 | 7.65 | 59.54 | 2.3 | Link |
Llama-2-7B | w2g128 | 7.94 | 58.72 | 2.2 | Link |
Llama-2-13B | fp16 | 4.88 | 67.81 | 25.4 | - |
Llama-2-13B | w4g128 | 4.96 | 67.27 | 6.8 | Link |
Llama-2-13B | w3g128 | 5.20 | 67.30 | 5.6 | Link |
Llama-2-13B | w2g64 | 6.55 | 63.10 | 4.0 | Link |
Llama-2-13B | w2g128 | 6.68 | 63.49 | 3.8 | Link |
Llama-2-70B | fp16 | 3.32 | 72.41 | 131.6 | - |
Llama-2-70B | w4g128 | 3.41 | 72.54 | 35.8 | Link |
Llama-2-70B | w3g128 | 3.65 | 71.88 | 29.1 | Link |
Llama-2-70B | w2g64 | 4.96 | 69.44 | 20.1 | Link |
Llama-2-70B | w2g128 | 5.26 | 68.73 | 18.9 | Link |
Llama-3-8B | fp16 | 6.14 | 68.58 | 13.0 | - |
Llama-3-8B | w4g128 | 6.50 | 68.43 | 5.4 | Link |
Llama-3-8B | w3g128 | 7.34 | 66.72 | 4.7 | Link |
Llama-3-8B | w2g64 | 12.47 | 58.65 | 3.9 | Link |
Llama-3-8B | w2g128 | 13.25 | 58.23 | 3.8 | Link |
Llama-3-70B | fp16 | 2.85 | 75.33 | 137.8 | - |
Llama-3-70B | w4g128 | 3.18 | 74.50 | 38.9 | Link |
Llama-3-70B | w3g128 | 4.88 | 71.90 | 32.2 | Link |
Llama-3-70B | w2g64 | 13.75 | 66.70 | 23.2 | Link |
Llama-3-70B | w2g128 | 16.79 | 65.06 | 22.0 | Link |
Llama-3-8B-Instruct | fp16 | 8.29 | 68.43 | 13.0 | - |
Llama-3-8B-Instruct | w4g128 | 8.76 | 67.80 | 5.4 | Link |
Llama-3-8B-Instruct | w3g128 | 9.83 | 66.54 | 4.7 | Link |
Llama-3-8B-Instruct | w2g64 | 16.77 | 58.62 | 3.9 | Link |
Llama-3-8B-Instruct | w2g128 | 18.02 | 57.19 | 3.8 | Link |
Llama-3-70B-Instruct | fp16 | 5.33 | 73.78 | 137.8 | - |
Llama-3-70B-Instruct | w4g128 | 5.77 | 73.52 | 38.9 | Link |
Llama-3-70B-Instruct | w3g128 | 7.25 | 69.80 | 32.2 | Link |
Llama-3-70B-Instruct | w2g64 | 12.48 | 65.60 | 23.2 | Link |
Llama-3-70B-Instruct | w2g128 | 13.48 | 61.75 | 22.0 | Link |
Please refer https://github.com/OpenGVLab/EfficientQAT for details. These checkpoints can be used to following E2E-AP, as well as be inferenced directly.