FileNotFoundError: Could not find model in TheBloke/guanaco-65B-GPTQ

#28
by muneerhanif7 - opened

I am getting this error on every TheBloke models, I have just simply copy paste the code from repo.
this is the code:
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

model_name_or_path = "TheBloke/guanaco-65B-GPTQ"
model_basename = "Guanaco-65B-GPTQ-4bit-128g.no-act.order"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)

and this is the error I am getting :

Downloading (…)okenizer_config.json: 100%
700/700 [00:00<00:00, 44.8kB/s]
Downloading tokenizer.model: 100%
500k/500k [00:00<00:00, 23.9MB/s]
Downloading (…)/main/tokenizer.json: 100%
1.84M/1.84M [00:00<00:00, 10.7MB/s]
Downloading (…)cial_tokens_map.json: 100%
411/411 [00:00<00:00, 29.0kB/s]
Downloading (…)lve/main/config.json: 100%
820/820 [00:00<00:00, 64.9kB/s]
Downloading (…)quantize_config.json: 100%
156/156 [00:00<00:00, 13.0kB/s]

FileNotFoundError Traceback (most recent call last)
Cell In[3], line 11
7 use_triton = False
9 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
---> 11 model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
12 model_basename=model_basename,
13 use_safetensors=True,
14 trust_remote_code=True,
15 device="cuda:0",
16 use_triton=use_triton,
17 quantize_config=None)
19 """
20 To download from a specific branch, use the revision parameter, as in this example:
21
(...)
28 quantize_config=None)
29 """

File /usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/auto.py:94, in AutoGPTQForCausalLM.from_quantized(cls, model_name_or_path, save_dir, device_map, max_memory, device, low_cpu_mem_usage, use_triton, inject_fused_attention, inject_fused_mlp, use_cuda_fp16, quantize_config, model_basename, use_safetensors, trust_remote_code, warmup_triton, trainable, **kwargs)
88 quant_func = GPTQ_CAUSAL_LM_MODEL_MAP[model_type].from_quantized
89 keywords = {
90 key: kwargs[key]
91 for key in signature(quant_func).parameters
92 if key in kwargs
93 }
---> 94 return quant_func(
95 model_name_or_path=model_name_or_path,
96 save_dir=save_dir,
97 device_map=device_map,
98 max_memory=max_memory,
99 device=device,
100 low_cpu_mem_usage=low_cpu_mem_usage,
101 use_triton=use_triton,
102 inject_fused_attention=inject_fused_attention,
103 inject_fused_mlp=inject_fused_mlp,
104 use_cuda_fp16=use_cuda_fp16,
105 quantize_config=quantize_config,
106 model_basename=model_basename,
107 use_safetensors=use_safetensors,
108 trust_remote_code=trust_remote_code,
109 warmup_triton=warmup_triton,
110 trainable=trainable,
111 **keywords
112 )

File /usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/_base.py:714, in BaseGPTQForCausalLM.from_quantized(cls, model_name_or_path, save_dir, device_map, max_memory, device, low_cpu_mem_usage, use_triton, torch_dtype, inject_fused_attention, inject_fused_mlp, use_cuda_fp16, quantize_config, model_basename, use_safetensors, trust_remote_code, warmup_triton, trainable, **kwargs)
711 break
713 if resolved_archive_file is None: # Could not find a model file to use
--> 714 raise FileNotFoundError(f"Could not find model in {model_name_or_path}")
716 model_save_name = resolved_archive_file
718 if not use_triton and trainable:

FileNotFoundError: Could not find model in TheBloke/guanaco-65B-GPTQ

I recently updated all my GPTQ models for direct Transformers compatibility (coming very soon)

Please check the README again and you'll see that the model_basename line is now: model_basename = "model". This is true for all branches in all GPTQ models.

Or in fact you can simply leave out model_basename now:

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)

Because the model_basename is now also configured in quantize_config.json.

In the next 24 - 48 hours I will be updating all my GPTQ READMEs to explain this in more detail, and provide example code for loading GPTQ models directly from Transformers. I am waiting for the new Transformers release to happen before I do this, which will be today or tomorrow.

Sign up or log in to comment