OSError: TheBloke/wizard-vicuna-13B-GPTQ does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

#9
by MorphzZ - opened

I try to load it like this:

from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("TheBloke/wizard-vicuna-13B-GPTQ")

and it gives this error:

image.png

OSError: TheBloke/wizard-vicuna-13B-GPTQ does not appear to have a file named
pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

How can I fix this? This is what I am following:

image.png

You can't load GPTQ models with bare transformers. You need to install AutoGPTQ.

Here is sample code using AutoGPTQ:

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

model_name_or_path = "TheBloke/wizard-vicuna-13B-GPTQ"
# You could also download the model locally, and access it there
# model_name_or_path = "/path/to/TheBloke_wizard-vicuna-13B-GPTQ"

model_basename = "wizard-vicuna-13B-GPTQ-4bit.compat.no-act-order"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

prompt = "Tell me about AI"
prompt_template=f'''### Human: {prompt}
### Assistant:'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline
# Note that if you use pipeline, you will see a spurious error message saying the model type is not supported
# This can be ignored!  Or you can hide it with the following logging line:
# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

Thanks for responding to me. I appreciate it. I did try installing AutoGPT. It gives me this error:

The detected CUDA version (12.1) mismatches the version that was used to compile PyTorch (11.7)

is this known to you and do you know how to fix this? I am able to run torch w/ GPU on my computer using transformers. why does AutoGPT complain then?

for anyone who runs into this, the issue that puzzled me is that I am able to run FastChat w/ CUDA, then why does AutoGPT complain? I think this explains it:

Your locally installed CUDA toolkit won’t be used as the PyTorch binaries ship with their own CUDA dependencies unless you build PyTorch from source or a custom extension.
Check the output of python -m torch.utils.collect_env and make sure a PyTorch version with a CUDA runtime is installed.

Sign up or log in to comment