File size: 1,174 Bytes
c0f2727
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aa5636f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
---
license: mit
datasets:
- wikitext
---

[gpt2-large](https://huggingface.co/openai-community/gpt2-large) quantized to 4-bit using [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ).

To use, first install AutoGPTQ:

```shell
pip install auto-gptq
```

Then load the model from the hub:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name = "smpanaro/gpt2-large-AutoGPTQ-4bit-128g"
model = AutoGPTQForCausalLM.from_quantized(model_name)
```


|Model|4-Bit Perplexity|16-Bit Perplexity|Delta|
|--|--|--|--|
|[smpanaro/gpt2-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/gpt2-AutoGPTQ-4bit-128g)|26.5000|25.1875|1.3125|
|[smpanaro/gpt2-medium-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/gpt2-medium-AutoGPTQ-4bit-128g)|19.1719|18.4739|0.698|
|smpanaro/gpt2-large-AutoGPTQ-4bit-128g|16.6875|16.4541|0.2334|
|[smpanaro/gpt2-xl-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/gpt2-xl-AutoGPTQ-4bit-128g)|14.9297|14.7951|0.1346|
<sub>Wikitext perplexity measured as in the [huggingface docs](https://huggingface.co/docs/transformers/en/perplexity), lower is better</sub>