openlm-research/open_llama_13b · Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

I used the following code to load the model:
`import torch
from transformers import LlamaTokenizer, LlamaForCausalLM

device = torch.device('cuda')

model_path = 'openlm-research/open_llama_3b'
#model_path = 'openlm-research/open_llama_7b'
#model_path = 'openlm-research/open_llama_13b'

tokenizer = LlamaTokenizer.from_pretrained(model_path)
model = LlamaForCausalLM.from_pretrained(
model_path, torch_dtype=torch.float16, device_map='auto'
)
`
but when generating output it gives the following error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

Any leads on how to solve it