Edit model card

Note: this repo has low accuracy and is under investigation.

yujiepan/Meta-Llama-3-8B-gptq-w4g64

This model applies AutoGPTQ on meta-llama/Meta-Llama-3-8B.

  • 4-bit symmetric weight only quantization
  • group_size=64
  • calibration set: c4-new

Accuracy

model precision wikitext ppl (↓)
meta-llama/Meta-Llama-3-8B FP16 9.179
yujiepan/Meta-Llama-3-8B-gptq-w4g64 w4g64 10.097

Note:

  • Evaluated on lm-evaluation-harness "wikitext" task
  • Wikitext PPL does not guarantee actual accuracy, but helps to check the distortion after quantization.

Codes

import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig

model_id = "meta-llama/Meta-Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)

quantization_config = GPTQConfig(
    bits=4, group_size=64,
    dataset="c4-new",
    tokenizer=tokenizer,
)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    low_cpu_mem_usage=True,
    quantization_config=quantization_config,
)
model.push_to_hub('yujiepan/Meta-Llama-3-8B-gptq-w4g64')
Downloads last month
27
Safetensors
Model size
2.05B params
Tensor type
FP16
·
I32
·