--- license: mit datasets: - wikitext --- [pythia-70m](https://huggingface.co/EleutherAI/pythia-70m) quantized to 4-bit using [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ). To use, first install AutoGPTQ: ```shell pip install auto-gptq ``` Then load the model from the hub: ```python from transformers import AutoModelForCausalLM, AutoTokenizer from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig model_name = "smpanaro/pythia-70m-AutoGPTQ-4bit-128g" model = AutoGPTQForCausalLM.from_quantized(model_name) ``` |Model|4-Bit Perplexity|16-Bit Perplexity|Delta| |--|--|--|--| |smpanaro/pythia-70m-AutoGPTQ-4bit-128g|49.125|-|-| |[smpanaro/pythia-160m-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/pythia-160m-AutoGPTQ-4bit-128g)|33.4375|23.3024|10.1351| |[smpanaro/pythia-410m-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/pythia-410m-AutoGPTQ-4bit-128g)|21.4688|13.9838|7.485| |[smpanaro/pythia-1b-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/pythia-1b-AutoGPTQ-4bit-128g)|12.0391|11.6178|0.4213| |[smpanaro/pythia-1.4b-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/pythia-1.4b-AutoGPTQ-4bit-128g)|10.9609|10.4391|0.5218| |[smpanaro/pythia-2.8b-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/pythia-2.8b-AutoGPTQ-4bit-128g)|9.8281|9.0028|0.8253| Wikitext perplexity measured as in the [huggingface docs](https://huggingface.co/docs/transformers/en/perplexity), lower is better