Edit model card

yujiepan/Meta-Llama-3-8B-gptq-w8asym

This model applies AutoGPTQ on meta-llama/Meta-Llama-3-8B.

  • 8-bit asymmetric weight only quantization
  • group_size=-1
  • calibration set: c4-new

Accuracy

model precision wikitext ppl (↓)
meta-llama/Meta-Llama-3-8B FP16 9.179
yujiepan/Meta-Llama-3-8B-gptq-w8asym int8-asym 9.356

Note:

  • Evaluated on lm-evaluation-harness "wikitext" task
  • Wikitext PPL does not guarantee actual accuracy, but helps to check the disortion after quantization.

Usage

from awq import AutoAWQForCausalLM
model = AutoAWQForCausalLM.from_quantized('yujiepan/Meta-Llama-3-8B-gptq-w8asym')

Codes

import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig

model_id = "meta-llama/Meta-Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)

quantization_config = GPTQConfig(
    bits=8, group_size=-1,
    dataset="c4-new",
    sym=False,
    tokenizer=tokenizer,
    use_cuda_fp16=True,
)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    low_cpu_mem_usage=True,
    quantization_config=quantization_config,
)
Downloads last month
14
Safetensors
Model size
2.8B params
Tensor type
I32
·
FP16
·