Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

#9
by smshr - opened

I used the following code to load the model:
`import torch
from transformers import LlamaTokenizer, LlamaForCausalLM

device = torch.device('cuda')

model_path = 'openlm-research/open_llama_3b'
#model_path = 'openlm-research/open_llama_7b'
#model_path = 'openlm-research/open_llama_13b'

tokenizer = LlamaTokenizer.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(
model_path, torch_dtype=torch.float16, device_map='auto'
)
`
but when generating output it gives the following error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

Any leads on how to solve it

device = torch.device('cuda')
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)

This moves input onto GPU.

Sign up or log in to comment