Edit model card

pythia-410m quantized to 4-bit using AutoGPTQ.

To use, first install AutoGPTQ:

pip install auto-gptq

Then load the model from the hub:

from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name = "smpanaro/pythia-410m-AutoGPTQ-4bit-128g"
model = AutoGPTQForCausalLM.from_quantized(model_name)
Model 4-Bit Perplexity 16-Bit Perplexity Delta
smpanaro/pythia-160m-AutoGPTQ-4bit-128g 33.4375 23.3024 10.1351
smpanaro/pythia-410m-AutoGPTQ-4bit-128g 21.4688 13.9838 7.485

Wikitext perplexity measured as in the huggingface docs, lower is better

Downloads last month
3

Dataset used to train smpanaro/pythia-410m-AutoGPTQ-4bit-128g

Collection including smpanaro/pythia-410m-AutoGPTQ-4bit-128g