Could not find model in TheBloke/wizard-vicuna-13B-GPTQ

#7
by cbiggerdev - opened

I've been trying to get Auto-GPTQ to work in a jupyter notebook with a bunch of different quantized models. I end up with this error every time.

Here is my basic code:
`
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import torch

quantized_model_dir = "TheBloke/stable-vicuna-13B-GPTQ"

tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)

AutoGPTQForCausalLM.from_quantized(quantized_model_dir, use_safetensors=True)
`

And here is my error:

`

FileNotFoundError Traceback (most recent call last)
Cell In[9], line 9
5 quantized_model_dir = "TheBloke/stable-vicuna-13B-GPTQ"
7 tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
----> 9 AutoGPTQForCausalLM.from_quantized(quantized_model_dir, use_safetensors=True)

File /opt/conda/lib/python3.10/site-packages/auto_gptq/modeling/auto.py:82, in AutoGPTQForCausalLM.from_quantized(cls, model_name_or_path, save_dir, device_map, max_memory, device, low_cpu_mem_usage, use_triton, inject_fused_attention, inject_fused_mlp, use_cuda_fp16, quantize_config, model_basename, use_safetensors, trust_remote_code, warmup_triton, **kwargs)
80 quant_func = GPTQ_CAUSAL_LM_MODEL_MAP[model_type].from_quantized
81 keywords = {key: kwargs[key] for key in signature(quant_func).parameters if key in kwargs}
---> 82 return quant_func(
83 model_name_or_path=model_name_or_path,
84 save_dir=save_dir,
85 device_map=device_map,
86 max_memory=max_memory,
87 device=device,
88 low_cpu_mem_usage=low_cpu_mem_usage,
89 use_triton=use_triton,
90 inject_fused_attention=inject_fused_attention,
91 inject_fused_mlp=inject_fused_mlp,
92 use_cuda_fp16=use_cuda_fp16,
93 quantize_config=quantize_config,
94 model_basename=model_basename,
95 use_safetensors=use_safetensors,
96 trust_remote_code=trust_remote_code,
97 warmup_triton=warmup_triton,
98 **keywords
99 )

File /opt/conda/lib/python3.10/site-packages/auto_gptq/modeling/_base.py:698, in BaseGPTQForCausalLM.from_quantized(cls, model_name_or_path, save_dir, device_map, max_memory, device, low_cpu_mem_usage, use_triton, torch_dtype, inject_fused_attention, inject_fused_mlp, use_cuda_fp16, quantize_config, model_basename, use_safetensors, trust_remote_code, warmup_triton, **kwargs)
695 break
697 if resolved_archive_file is None: # Could not find a model file to use
--> 698 raise FileNotFoundError(f"Could not find model in {model_name_or_path}")
700 model_save_name = resolved_archive_file
702 # == step2: convert model to gptq-model (replace Linear with QuantLinear) == #

FileNotFoundError: Could not find model in TheBloke/stable-vicuna-13B-GPTQ
`

Any help would be greatly appreciated!

You need to add model_basename to tell it the name of the model file

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import torch

quantized_model_dir = "TheBloke/stable-vicuna-13B-GPTQ"
model_basename = "wizard-vicuna-13B-GPTQ-4bit.compat.no-act-order"

tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)

AutoGPTQForCausalLM.from_quantized(quantized_model_dir, use_safetensors=True, model_basename=model_basename)

Hey, thank you so much. I will use this tonight.

It didn't work for me, the program is able to download the toknizers, but when it tries to download the model, i got the following error:
raise FileNotFoundError(f"Could not find model in {model_name_or_path}")
FileNotFoundError: Could not find model in TheBloke/stable-vicuna-13B-GPTQ

Do you know what is wrong?
If i want to download the model manually, should i put the safetensor file together with the tokenizer.model file? Since the program created some folders like blobs, refs, i am not sure where i should put the safetensor file into.

Solved

i typed the wrong name. "wizard-vicuna", not "stable-vicuna"

You need to add model_basename to tell it the name of the model file

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import torch

quantized_model_dir = "TheBloke/stable-vicuna-13B-GPTQ"
model_basename = "wizard-vicuna-13B-GPTQ-4bit.compat.no-act-order"

tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)

AutoGPTQForCausalLM.from_quantized(quantized_model_dir, use_safetensors=True, model_basename=model_basename)

May I ask you where you got the model_basename from, please?

Sign up or log in to comment