Please tell me about gptq quantisation.

#1
by mjw98 - opened

Thanks for your great work. I would like to ask you about your process of quantification using gptq? I just used autogptq for quantisation, but I can't load it after quantisation using AutoModelForCausalLM.from_pretrained(). It can only be loaded using AutoGPTQForCausalLM.from_quantised, can you tell me where your modification is?

Answered in the other thread. I suggest using my AutoGPTQ wrapper script for easy quantisations which can be loaded with Transformers: https://github.com/TheBlokeAI/AIScripts/blob/main/quant_autogptq.py

Thank you very much for your suggestion, but I still don't understand where to change the "model_basename", I tried the AutoGPTQ wrapper script you provided, but the output is still gptq-4bit-128g.safetensors. I used the following code to quantize it and got the result as shown below
python quant_autogptq.py  "/content/llama-1B" "777" "c4" .

1bf832de4aaacf82ad1b49199fd3b2e.png

I guess is to change the safetensors’ filename and change "model_file_base_name": "gptq_model-4bit-128g" to "model_file_base_name": "model" in quantize_config.json, hope to get your guidance.

Oh sorry, I never updated the public quant_autogptq.py for that.

Add
model_file_base_name='model', into the BaseQuantizeConfig() definition starting on line 194 of my quant_autogptq.py script.

And yes, to fix an already made GPTQ, rename the safetensors file to model.safetensors and also set model_file_base_name to "model" in quantize_config.json. This will then be identical to a GPTQ made with the change to BaseQuantizeConfig() described above.

@TheBloke Thank you very much for your patient guidance, I managed to convert it into a format that tarsnfomer recognizes!

Sign up or log in to comment