Note: this repo has low accuracy and is under investigation.
yujiepan/Meta-Llama-3-8B-gptq-w4g64
This model applies AutoGPTQ on meta-llama/Meta-Llama-3-8B.
- 4-bit symmetric weight only quantization
- group_size=64
- calibration set: c4-new
Accuracy
model | precision | wikitext ppl (↓) |
---|---|---|
meta-llama/Meta-Llama-3-8B | FP16 | 9.179 |
yujiepan/Meta-Llama-3-8B-gptq-w4g64 | w4g64 | 10.097 |
Note:
- Evaluated on lm-evaluation-harness "wikitext" task
- Wikitext PPL does not guarantee actual accuracy, but helps to check the distortion after quantization.
Codes
import os
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GPTQConfig
model_id = "meta-llama/Meta-Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
quantization_config = GPTQConfig(
bits=4, group_size=64,
dataset="c4-new",
tokenizer=tokenizer,
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
low_cpu_mem_usage=True,
quantization_config=quantization_config,
)
model.push_to_hub('yujiepan/Meta-Llama-3-8B-gptq-w4g64')
- Downloads last month
- 12
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.