Please tell me about gptq quantisation.

by mjw98 - opened Oct 10, 2023

Oct 10, 2023

Thanks for your great work. I would like to ask you about your process of quantification using gptq? I just used autogptq for quantisation, but I can't load it after quantisation using AutoModelForCausalLM.from_pretrained(). It can only be loaded using AutoGPTQForCausalLM.from_quantised, can you tell me where your modification is?

TheBloke

Owner Oct 10, 2023

Answered in the other thread. I suggest using my AutoGPTQ wrapper script for easy quantisations which can be loaded with Transformers: https://github.com/TheBlokeAI/AIScripts/blob/main/quant_autogptq.py

j7j

Oct 11, 2023

•

edited Oct 11, 2023

Thank you very much for your suggestion, but I still don't understand where to change the "model_basename", I tried the AutoGPTQ wrapper script you provided, but the output is still gptq-4bit-128g.safetensors. I used the following code to quantize it and got the result as shown below
python quant_autogptq.py "/content/llama-1B" "777" "c4" .

I guess is to change the safetensors’ filename and change "model_file_base_name": "gptq_model-4bit-128g" to "model_file_base_name": "model" in quantize_config.json, hope to get your guidance.

TheBloke

Owner Oct 11, 2023

Oh sorry, I never updated the public quant_autogptq.py for that.

Add
model_file_base_name='model', into the BaseQuantizeConfig() definition starting on line 194 of my quant_autogptq.py script.

And yes, to fix an already made GPTQ, rename the safetensors file to model.safetensors and also set model_file_base_name to "model" in quantize_config.json. This will then be identical to a GPTQ made with the change to BaseQuantizeConfig() described above.

mjw98

Oct 14, 2023

•

edited Oct 14, 2023

@TheBloke Thank you very much for your patient guidance, I managed to convert it into a format that tarsnfomer recognizes!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment