I m getting this error while running the python code : FileNotFoundError: Could not find model in TheBloke/WizardCoder-Guanaco-15B-V1.0-GPTQ

#3
by shikhardadhich - opened

I m getting this error while running the python code (copied from example)

File "/home/ubuntu/wizar.py", line 12, in
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
File "/home/ubuntu/Wiz/lib/python3.10/site-packages/auto_gptq/modeling/auto.py", line 108, in from_quantized
return quant_func(
File "/home/ubuntu/Wiz/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 791, in from_quantized
raise FileNotFoundError(f"Could not find model in {model_name_or_path}")
FileNotFoundError: Could not find model in TheBloke/WizardCoder-Guanaco-15B-V1.0-GPTQ

Code:
from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

model_name_or_path = "TheBloke/WizardCoder-Guanaco-15B-V1.0-GPTQ"
model_basename = "wizardcoder-guanaco-15b-v1.0-GPTQ-4bit-128g.no-act.order"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=False,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)

prompt = "Tell me about AI"
prompt_template=f'''
Below is an instruction that describes a task. Write a response that appropriately completes the request.

Instruction: PROMPT

Response:

'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

Inference can also be done using transformers' pipeline

Prevent printing spurious transformers error when using pipeline with AutoGPTQ

logging.set_verbosity(logging.CRITICAL)

print("*** Pipeline:")
pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_new_tokens=512,
temperature=0.7,
top_p=0.95,
repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

Sign up or log in to comment