Edit model card

yujiepan/Meta-Llama-3-8B-awq-w4g64

This model applies AutoAWQ on meta-llama/Meta-Llama-3-8B.

  • 4-bit asymmetric weight only quantization
  • group_size=64
  • calibration set: pileval

Accuracy

model precision wikitext ppl (↓)
meta-llama/Meta-Llama-3-8B FP16 9.179
yujiepan/Meta-Llama-3-8B-awq-w4g64 w4g64 9.219

Note:

  • Evaluated on lm-evaluation-harness "wikitext" task
  • Wikitext PPL does not guarantee actual accuracy, but helps to check the disortion after quantization.

Usage

from awq import AutoAWQForCausalLM
model = AutoAWQForCausalLM.from_quantized('yujiepan/Meta-Llama-3-8B-awq-w4g64')

Codes

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = "meta-llama/Meta-Llama-3-8B"
quant_config = {"zero_point": True, "q_group_size": 64, "w_bit": 4, "version": "GEMM"}

model = AutoAWQForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model.quantize(tokenizer, quant_config=quant_config)
Downloads last month
96
Safetensors
Model size
2.05B params
Tensor type
I32
·
FP16
·