Not able to load via transformers

#16
by balu548411 - opened

Hi bro, I am newbie to qlora, I tried below code and it raises OSError. Can you tell me how to load and use this using python.
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("TheBloke/guanaco-65B-GPTQ")

model = AutoModelForCausalLM.from_pretrained("TheBloke/guanaco-65B-GPTQ")


OSError Traceback (most recent call last)
in <cell line: 5>()
3 tokenizer = AutoTokenizer.from_pretrained("TheBloke/guanaco-65B-GPTQ")
4
----> 5 model = AutoModelForCausalLM.from_pretrained("TheBloke/guanaco-65B-GPTQ")

1 frames
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
2553 )
2554 else:
-> 2555 raise EnvironmentError(
2556 f"{pretrained_model_name_or_path} does not appear to have a file named"
2557 f" {_add_variant(WEIGHTS_NAME, variant)}, {TF2_WEIGHTS_NAME}, {TF_WEIGHTS_NAME} or"

OSError: TheBloke/guanaco-65B-GPTQ does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

You can't load GPTQ models from regular transformers, you need AutoGPTQ

pip install auto-gptq

Here is example code:

from transformers import AutoTokenizer, pipeline, logging
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
import argparse

model_name_or_path = "TheBloke/guanaco-65B-GPTQ"
model_basename = "Guanaco-65B-GPTQ-4bit.act-order"

use_triton = False

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton=use_triton,
        quantize_config=None)

prompt = "Tell me about AI"
prompt_template=f'''### Instruction: {prompt}
### Response:'''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
print(tokenizer.decode(output[0]))

# Inference can also be done using transformers' pipeline

# Prevent printing spurious transformers error when using pipeline with AutoGPTQ
logging.set_verbosity(logging.CRITICAL)

print("*** Pipeline:")
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.95,
    repetition_penalty=1.15
)

print(pipe(prompt_template)[0]['generated_text'])

Thank you bro 😊😊

First of all, thanks a lot for your work!
I encounter an issue which is directly caused by following codes:

model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename=model_basename,
        use_safetensors=True,
        device_map="auto",
        trust_remote_code=True,
        device="cuda",
        use_triton=use_triton,
        quantize_config=None)

it first warns me:

WARNING 2023-07-03 22:36:45,587-1d: CUDA extension not installed.
....
WARNING 2023-07-03 22:36:58,012-1d: The safetensors archive passed at /home/mydir/.cache/huggingface/hub/models--TheBloke--guanaco-65B-GPTQ/snapshots/c1a31c76e7228a13bc542b25243b912f12e39c87/Guanaco-65B-GPTQ-4bit.act-order.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.

after a huge amount of information about device_map, it raises the following error:
```

C++ Traceback (most recent call last):

No stack trace in paddle, may be caused by external reasons.


Error Message Summary:

FatalError: Access to an undefined portion of a memory object is detected by the operating system.
[TimeInfo: *** Aborted at 1688395081 (unix time) try "date -d @1688395081" if you are using GNU date ***]
[SignalInfo: *** SIGBUS (@0x7fbce9c3dff0) received by PID 424101 (TID 0x7fbea6e7e740) from PID 18446744073336512496 ***]


![image.png](https://cdn-uploads.huggingface.co/production/uploads/6033ae93b5883695ce9d0918/LsYSFyg909xJjyhnCxeCj.png)

I pretty sure that I have my cudatoolkit installted, do you have any clue about the problerm?
Again, thanks for your work and hope for your reply.

Firstly, just to check you're running this on a system with an Nvidia GPU available, with at least 48GB VRAM?

If so, the first problem is that the CUDA extension is not installed. Please try re-installing auto-gptq with:

pip3 uninstall -y auto-gptq
GITHUB_ACTIONS=true pip3 install auto-gptq

Not sure about the rest, let's see if installing AutoGPTQ with the CUDA module available fixes that first.

Firstly, just to check you're running this on a system with an Nvidia GPU available, with at least 48GB VRAM?

If so, the first problem is that the CUDA extension is not installed. Please try re-installing auto-gptq with:

pip3 uninstall -y auto-gptq
GITHUB_ACTIONS=true pip3 install auto-gptq

Not sure about the rest, let's see if installing AutoGPTQ with the CUDA module available fixes that first.

Thank you so much

Sign up or log in to comment