Please share working code sample and exact versions of required packages.
#1
by
Alexander-Minushkin
- opened
Receiving error:
ValueError: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.
Installed packages:
transformers 4.37.1
bitsandbytes-0.42.0
scipy-1.12.0
accelerate 0.26.1
Code:
from transformers import AutoModelForCausalLM, AutoTokenizer
device = "cuda:3" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained("Mistral-7B-v0.1-int8")
tokenizer = AutoTokenizer.from_pretrained("Mistral-7B-v0.1-int8")
messages = [
{"role": "user", "content": "What is your favourite condiment?"},
{"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
{"role": "user", "content": "Do you have mayonnaise recipes?"}
]
encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
model_inputs = encodeds.to(device)
model.to(device)
generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])