yujiepan/Meta-Llama-3-8B-Instruct-awq-w4g64
This model applies AutoAWQ on meta-llama/Meta-Llama-3-8B-Instruct.
- 4-bit asymmetric weight only quantization
- group_size=64
- calibration set: pileval
Accuracy
model | precision | wikitext ppl (↓) |
---|---|---|
meta-llama/Meta-Llama-3-8B-Instruct | FP16 | 10.842 |
yujiepan/Meta-Llama-3-8B-Instruct-awq-w4g64 | w4g64 | 10.943 |
Note:
- Evaluated on lm-evaluation-harness "wikitext" task
- Wikitext PPL does not guarantee actual accuracy, but helps to check the distortion after quantization.
Usage
from awq import AutoAWQForCausalLM
model = AutoAWQForCausalLM.from_quantized('yujiepan/Meta-Llama-3-8B-awq-w4g64-Instruct')
Codes
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
model_path = "meta-llama/Meta-Llama-3-8B-Instruct"
quant_config = {"zero_point": True, "q_group_size": 64, "w_bit": 4, "version": "GEMM"}
model = AutoAWQForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model.quantize(tokenizer, quant_config=quant_config)
- Downloads last month
- 4
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.