smpanaro commited on
Commit
90843e0
1 Parent(s): a7d67bd

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -0
README.md ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - wikitext
5
+ ---
6
+
7
+ [gpt2-xl](https://huggingface.co/openai-community/gpt2-xl) quantized to 4-bit using [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ).
8
+
9
+ To use, first install AutoGPTQ:
10
+
11
+ ```shell
12
+ pip install auto-gptq
13
+ ```
14
+
15
+ Then load the model from the hub:
16
+ ```python
17
+ from transformers import AutoModelForCausalLM, AutoTokenizer
18
+ from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
19
+
20
+ model_name = "smpanaro/gpt2-AutoGPTQ-4bit-128g"
21
+ model = AutoGPTQForCausalLM.from_quantized(model_name, use_triton=True)
22
+ # Note: despite this model being quantized only using groups and desc_act=False, Triton still seems to be required.
23
+ ```
24
+
25
+
26
+ |Model|4-Bit Perplexity|16-Bit Perplexity|Delta|
27
+ |--|--|--|--|
28
+ |[smpanaro/gpt2-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/gpt2-AutoGPTQ-4bit-128g)|26.5000|25.1875|1.3125|
29
+ |[smpanaro/gpt2-medium-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/gpt2-medium-AutoGPTQ-4bit-128g)|19.1719|18.4739|0.698|
30
+ |[smpanaro/gpt2-large-AutoGPTQ-4bit-128g](https://huggingface.co/smpanaro/gpt2-large-AutoGPTQ-4bit-128g)|16.6875|16.4541|0.2334|
31
+ |smpanaro/gpt2-xl-AutoGPTQ-4bit-128g|14.9297|14.7951|0.1346|
32
+ <sub>Wikitext perplexity measured as in the [huggingface docs](https://huggingface.co/docs/transformers/en/perplexity), lower is better</sub>