|
--- |
|
license: mit |
|
datasets: |
|
- wikitext |
|
--- |
|
|
|
[pythia-1b](https://huggingface.co/EleutherAI/pythia-1b) quantized to 4-bit using [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ). |
|
|
|
To use, first install AutoGPTQ: |
|
|
|
```shell |
|
pip install auto-gptq |
|
``` |
|
|
|
Then load the model from the hub: |
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig |
|
|
|
model_name = "smpanaro/pythia-1b-AutoGPTQ-4bit-128g" |
|
model = AutoGPTQForCausalLM.from_quantized(model_name) |
|
``` |
|
|
|
|
|
|Model|4-Bit Perplexity|16-Bit Perplexity|Delta| |
|
|--|--|--|--| |
|
|[smpanaro/pythia-160m-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/pythia-160m-AutoGPTQ-4bit-128g)|33.4375|23.3024|10.1351| |
|
|[smpanaro/pythia-410m-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/pythia-410m-AutoGPTQ-4bit-128g)|21.4688|13.9838|7.485| |
|
|smpanaro/pythia-1b-AutoGPTQ-4bit-128g|12.0391|11.6178|0.4213| |
|
|
|
|
|
<sub>Wikitext perplexity measured as in the [huggingface docs](https://huggingface.co/docs/transformers/en/perplexity), lower is better</sub> |