Please tell me about gptq quantisation.
Thanks for your great work. I would like to ask you about your process of quantification using gptq? I just used autogptq for quantisation, but I can't load it after quantisation using AutoModelForCausalLM.from_pretrained(). It can only be loaded using AutoGPTQForCausalLM.from_quantised, can you tell me where your modification is?
Answered in the other thread. I suggest using my AutoGPTQ wrapper script for easy quantisations which can be loaded with Transformers: https://github.com/TheBlokeAI/AIScripts/blob/main/quant_autogptq.py
Thank you very much for your suggestion, but I still don't understand where to change the "model_basename", I tried the AutoGPTQ wrapper script you provided, but the output is still gptq-4bit-128g.safetensors. I used the following code to quantize it and got the result as shown below
python quant_autogptq.py "/content/llama-1B" "777" "c4" .
I guess is to change the safetensors’ filename and change "model_file_base_name": "gptq_model-4bit-128g" to "model_file_base_name": "model" in quantize_config.json, hope to get your guidance.
Oh sorry, I never updated the public quant_autogptq.py for that.
Addmodel_file_base_name='model',
into the BaseQuantizeConfig()
definition starting on line 194 of my quant_autogptq.py script.
And yes, to fix an already made GPTQ, rename the safetensors file to model.safetensors
and also set model_file_base_name
to "model"
in quantize_config.json. This will then be identical to a GPTQ made with the change to BaseQuantizeConfig() described above.
@TheBloke Thank you very much for your patient guidance, I managed to convert it into a format that tarsnfomer recognizes!