smpanaro's picture
Create README.md
c2667aa verified
---
license: mit
datasets:
- wikitext
---
[pythia-160m](https://huggingface.co/EleutherAI/pythia-160m) quantized to 4-bit using [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ).
To use, first install AutoGPTQ:
```shell
pip install auto-gptq
```
Then load the model from the hub:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
model_name = "smpanaro/pythia-160m-AutoGPTQ-4bit-128g"
model = AutoGPTQForCausalLM.from_quantized(model_name)
```
|Model|4-Bit Perplexity|16-Bit Perplexity|Delta|
|--|--|--|--|
|smpanaro/pythia-160m-AutoGPTQ-4bit-128g|33.4375|23.3024|10.1351|
<sub>Wikitext perplexity measured as in the [huggingface docs](https://huggingface.co/docs/transformers/en/perplexity), lower is better</sub>