not able to load the model

by balu548411 - opened Jun 7, 2023

Discussion

balu548411

Jun 7, 2023

can someone suggest how to load this model

code:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("TheBloke/guanaco-33B-GPTQ")

model = AutoModelForCausalLM.from_pretrained("TheBloke/guanaco-33B-GPTQ")

error:

OSError Traceback (most recent call last)
in <cell line: 5>()
3 tokenizer = AutoTokenizer.from_pretrained("TheBloke/guanaco-33B-GPTQ")
4
----> 5 model = AutoModelForCausalLM.from_pretrained("TheBloke/guanaco-33B-GPTQ")

1 frames
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
2491 )
2492 else:
-> 2493 raise EnvironmentError(
2494 f"{pretrained_model_name_or_path} does not appear to have a file named"
2495 f" {_add_variant(WEIGHTS_NAME, variant)}, {TF2_WEIGHTS_NAME}, {TF_WEIGHTS_NAME} or"

OSError: TheBloke/guanaco-33B-GPTQ does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

TheBloke

Owner Jun 7, 2023

You can't use standard transformers, you need to use AutoGPTQ:

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

parser = argparse.ArgumentParser(description='Simple AutoGPTQ example')
parser.add_argument('model_name_or_path', type=str, help='Model folder or repo')
parser.add_argument('--model_basename', type=str, help='Model file basename if model is not named gptq_model-Xb-Ygr')
parser.add_argument('--use_slow', action="store_true", help='Use slow tokenizer')
parser.add_argument('--use_safetensors', action="store_true", help='Model file basename if model is not named gptq_model-Xb-Ygr')
parser.add_argument('--use_triton', action="store_true", help='Use Triton for inference?')
parser.add_argument('--bits', type=int, default=4, help='Specify GPTQ bits. Only needed if no quantize_config.json is provided')
parser.add_argument('--group_size', type=int, default=128, help='Specify GPTQ group_size. Only needed if no quantize_config.json is provided')
parser.add_argument('--desc_act', action="store_true", help='Specify GPTQ desc_act. Only needed if no quantize_config.json is provided')

args = parser.parse_args()

quantized_model_dir = args.model_name_or_path

tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, use_fast=not args.use_slow)

try:
   quantize_config = BaseQuantizeConfig.from_pretrained(quantized_model_dir)
except:
    quantize_config = BaseQuantizeConfig(
            bits=args.bits,
            group_size=args.group_size,
            desc_act=args.desc_act
        )

model = AutoGPTQForCausalLM.from_quantized(quantized_model_dir,
        use_safetensors=True,
        model_basename=args.model_basename,
        device="cuda:0",
        use_triton=args.use_triton,
        quantize_config=quantize_config)

# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

prompt = "Tell me about AI"
prompt_template=f'''### Human: {prompt}
### Assistant:'''

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

Example execution:

[pytorch2] ubuntu@h100:/workspace/AIScripts git:(main) $ python simple_autogptq.py TheBloke/guanaco-7B-GPTQ --model_basename Guanaco-7B-GPTQ-4bit-128g.no-act-order --use_safetensors
Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 700/700 [00:00<00:00, 1.59MB/s]
Downloading tokenizer.model: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500k/500k [00:00<00:00, 8.68MB/s]
Downloading (…)/main/tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.84M/1.84M [00:00<00:00, 7.91MB/s]
Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 411/411 [00:00<00:00, 1.34MB/s]
Downloading (…)quantize_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 223/223 [00:00<00:00, 735kB/s]
Downloading (…)lve/main/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 593/593 [00:00<00:00, 1.92MB/s]
Downloading (…)ct-order.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.00G/4.00G [00:36<00:00, 111MB/s]
The safetensors archive passed at /home/ubuntu/.cache/huggingface/hub/models--TheBloke--guanaco-7B-GPTQ/snapshots/e9e797cac5e4385a10e3a74927860c6552f860c6/Guanaco-7B-GPTQ-4bit-128g.no-act-order.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
skip module injection for FusedLlamaMLPForQuantizedModel not support integrate without triton yet.
*** Pipeline:
### Human: Tell me about AI
### Assistant: Artificial intelligence (AI) is a type of technology that simulates human intelligence and enables computers to perform tasks that would require human intelligence if done by people. It includes the study of how to make computer systems do things that would require human intelligence if done by humans.

The term "artificial intelligence" was first used in 1956, when John McCarthy coined it while at MIT's Research Laboratory for Electronics. He defined it as "the science and engineering of making intelligent machines."

Today, AI has applications in many fields including healthcare, finance, manufacturing, transportation, and more. For example, AI can be used to diagnose medical conditions, automate financial transactions, design products, and manage fleets of vehicles.


*** Generate:
<s> ### Human: Tell me about AI
### Assistant: Artificial Intelligence (AI) is a field of computer science that deals with the simulation of human intelligence and the automation of tasks that require it. It is a subfield of computer science that deals with the design, development, and study of intelligent computer systems. AI is a branch of computer science that deals with the simulation of human intelligence by machines.

Note the need to specify --model_basename Guanaco-7B-GPTQ-4bit-128g.no-act-order to tell it the name of the model file (enter everything before .safetensors in the model file)

ramzeez88

Jun 7, 2023

how do you pass the directory to the model ? for me it either says error: the following arguments are required: model_name_or_path or unrecognized arguments: --model_name_or_path after giving it the directory.

TheBloke

Owner Jun 7, 2023

You can see in the output above how I'm passing it. It's the first argument

python simple_autogptq.py TheBloke/guanaco-7B-GPTQ --model_basename Guanaco-7B-GPTQ-4bit-128g.no-act-order --use_safetensors

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment