yujiepan/Meta-Llama-3-8B-Instruct-gptq-w8asym

This model applies AutoGPTQ on meta-llama/Meta-Llama-3-8B-Instruct.

  • 8-bit asymmetric weight only quantization
  • group_size=-1
  • calibration set: c4-new

Accuracy

model precision wikitext ppl (↓)
meta-llama/Meta-Llama-3-8B-Instruct FP16 10.842
yujiepan/Meta-Llama-3-8B-Instruct-gptq-w8asym int8-asym 10.854

Note:

  • Evaluated on lm-evaluation-harness "wikitext" task
  • Wikitext PPL does not guarantee actual accuracy, but helps to check the disortion after quantization.

Usage

from awq import AutoAWQForCausalLM
model = AutoAWQForCausalLM.from_quantized('yujiepan/Meta-Llama-3-8B-Instruct-gptq-w8asym')

Codes

import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)

quantization_config = GPTQConfig(
    bits=8, group_size=-1,
    dataset="c4-new",
    sym=False,
    tokenizer=tokenizer,
    use_cuda_fp16=True,
)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    low_cpu_mem_usage=True,
    quantization_config=quantization_config,
)
Downloads last month
8
Safetensors
Model size
2.8B params
Tensor type
I32
·
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.