smpanaro
/

gpt2-xl-AutoGPTQ-4bit-128g

Text Generation

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

smpanaro commited on Feb 28, 2024

Commit

e548561

·

verified ·

1 Parent(s): 90843e0

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -17,7 +17,7 @@ Then load the model from the hub:
 from transformers import AutoModelForCausalLM, AutoTokenizer
 from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
-model_name = "smpanaro/gpt2-AutoGPTQ-4bit-128g"
 model = AutoGPTQForCausalLM.from_quantized(model_name, use_triton=True)
 # Note: despite this model being quantized only using groups and desc_act=False, Triton still seems to be required.
 ```

 from transformers import AutoModelForCausalLM, AutoTokenizer
 from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
+model_name = "smpanaro/gpt2-xl-AutoGPTQ-4bit-128g"
 model = AutoGPTQForCausalLM.from_quantized(model_name, use_triton=True)
 # Note: despite this model being quantized only using groups and desc_act=False, Triton still seems to be required.
 ```