metadata
license: mit
datasets:
- wikitext
gpt2-xl quantized to 4-bit using AutoGPTQ.
To use, first install AutoGPTQ:
pip install auto-gptq
Then load the model from the hub:
from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
model_name = "smpanaro/gpt2-xl-AutoGPTQ-4bit-128g"
model = AutoGPTQForCausalLM.from_quantized(model_name, use_triton=True)
# Note: despite this model being quantized only using groups and desc_act=False, Triton still seems to be required.
Model | 4-Bit Perplexity | 16-Bit Perplexity | Delta |
---|---|---|---|
smpanaro/gpt2-AutoGPTQ-4bit-128g | 26.5000 | 25.1875 | 1.3125 |
smpanaro/gpt2-medium-AutoGPTQ-4bit-128g | 19.1719 | 18.4739 | 0.698 |
smpanaro/gpt2-large-AutoGPTQ-4bit-128g | 16.6875 | 16.4541 | 0.2334 |
smpanaro/gpt2-xl-AutoGPTQ-4bit-128g | 14.9297 | 14.7951 | 0.1346 |
Wikitext perplexity measured as in the huggingface docs, lower is better |