Edit model card

yujiepan/Meta-Llama-3-8B-Instruct-awq-w4g64

This model applies AutoAWQ on meta-llama/Meta-Llama-3-8B-Instruct.

  • 4-bit asymmetric weight only quantization
  • group_size=64
  • calibration set: pileval

Accuracy

model precision wikitext ppl (↓)
meta-llama/Meta-Llama-3-8B-Instruct FP16 10.842
yujiepan/Meta-Llama-3-8B-Instruct-awq-w4g64 w4g64 10.943

Note:

  • Evaluated on lm-evaluation-harness "wikitext" task
  • Wikitext PPL does not guarantee actual accuracy, but helps to check the distortion after quantization.

Usage

from awq import AutoAWQForCausalLM
model = AutoAWQForCausalLM.from_quantized('yujiepan/Meta-Llama-3-8B-awq-w4g64-Instruct')

Codes

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = "meta-llama/Meta-Llama-3-8B-Instruct"
quant_config = {"zero_point": True, "q_group_size": 64, "w_bit": 4, "version": "GEMM"}

model = AutoAWQForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model.quantize(tokenizer, quant_config=quant_config)
Downloads last month
86
Safetensors
Model size
2.05B params
Tensor type
I32
·
FP16
·