yujiepan/Meta-Llama-3-8B-Instruct-gptq-w8asym
This model applies AutoGPTQ on meta-llama/Meta-Llama-3-8B-Instruct.
- 8-bit asymmetric weight only quantization
- group_size=-1
- calibration set: c4-new
Accuracy
model | precision | wikitext ppl (↓) |
---|---|---|
meta-llama/Meta-Llama-3-8B-Instruct | FP16 | 10.842 |
yujiepan/Meta-Llama-3-8B-Instruct-gptq-w8asym | int8-asym | 10.854 |
Note:
- Evaluated on lm-evaluation-harness "wikitext" task
- Wikitext PPL does not guarantee actual accuracy, but helps to check the disortion after quantization.
Usage
from awq import AutoAWQForCausalLM
model = AutoAWQForCausalLM.from_quantized('yujiepan/Meta-Llama-3-8B-Instruct-gptq-w8asym')
Codes
import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
quantization_config = GPTQConfig(
bits=8, group_size=-1,
dataset="c4-new",
sym=False,
tokenizer=tokenizer,
use_cuda_fp16=True,
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
low_cpu_mem_usage=True,
quantization_config=quantization_config,
)
- Downloads last month
- 8
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.