Getting error while loading model_basename = "gptq_model-8bit-128g"
I am using below code :
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
model_name_or_path = "TheBloke/Llama-2-13B-chat-GPTQ"
model_basename = "gptq_model-8bit-128g"
use_triton = False
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)
But I am getting error :
FileNotFoundError Traceback (most recent call last)
in <cell line: 11>()
9 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
10
---> 11 model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
12 model_basename=model_basename,
13 use_safetensors=True,
1 frames
/usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/_base.py in from_quantized(cls, model_name_or_path, save_dir, device_map, max_memory, device, low_cpu_mem_usage, use_triton, torch_dtype, inject_fused_attention, inject_fused_mlp, use_cuda_fp16, quantize_config, model_basename, use_safetensors, trust_remote_code, warmup_triton, trainable, **kwargs)
712
713 if resolved_archive_file is None: # Could not find a model file to use
--> 714 raise FileNotFoundError(f"Could not find model in {model_name_or_path}")
715
716 model_save_name = resolved_archive_file
FileNotFoundError: Could not find model in TheBloke/Llama-2-13B-chat-GPTQ
Please update to AutoGPTQ 0.3.2, released yesterday. In AutoGPTQ 0.3.0 and 0.2.2 there was a bug where the revision
parameter was not followed. This is now fixed.
Ok I will try this one .
Is the below code correct if I want to load model from a particular barch (i.e. gptq-8bit-128g-actorder_True) :
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
model_name_or_path = "TheBloke/Llama-2-13B-chat-GPTQ"
model_basename = "gptq_model-4bit-128g"
use_triton = False
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
revision="gptq-8bit-128g-actorder_True",
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
quantize_config=None)
Can you please provide me a python code to load 8 bit 128g model ?
Yes, just saw that one - presumably some subtle basename thing that changed perhaps?
The required model_basename
changed yesterday (August 20th). It is now model_basename = "model"
- or you can just leave that line out completely, as it's now configured automatically by quantize_config.json
. You no longer need to specify model_basename
in the .from_quantized()
call. But if you do specify it, set it to "model"
.
This change has happened due to adding support for an upcoming change in Transformers, which will allow loading GPTQ models directly from Transformers
I did automatically update the README to reflect the model_basename
change, but haven't mentioned the changes in more detail yet. I will be updating all GPTQ READMEs in the next 48 hours to make this clearer.
Ok, thanks for that - so this is the main branch model? What is suggest for the others, similar?
Same for all of them. They're all called model.safetensors
now, and each branch's respective quantize_config.json
includes that, so you don't need to specify model_basename
any more.