Cannot run model with torch.float16

#1
by alfredplpl - opened

I cannot run this model with torch.float16. And, the load speed is slower.

I ran the following code:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name="llm-jp/llm-jp-13b-instruct-full-jaster-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16) 
text = 'θ‡ͺ焢言θͺžε‡¦η†γ¨γ―何か' 
text = text + "### ε›žη­”οΌš"
tokenized_input = tokenizer.encode(text, add_special_tokens=False, return_tensors="pt").to(model.device)
with torch.no_grad():
    output = model.generate(
        tokenized_input,
        max_new_tokens=256,
        do_sample=True,
        top_p=0.95,
        temperature=0.7,
    )[0]
print(tokenizer.decode(output))

Then, I got the following error:

/home/username/anaconda3/envs/pdf-agent/bin/python /mnt/my_raid/github/pdf-agent/llm_jp_13b.py 
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3/3 [00:08<00:00,  2.79s/it]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:7 for open-end generation.
Traceback (most recent call last):
  File "/mnt/my_raid/github/pdf-agent/llm_jp_13b.py", line 10, in <module>
    output = model.generate(
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/transformers/generation/utils.py", line 1652, in generate
    return self.sample(
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/transformers/generation/utils.py", line 2734, in sample
    outputs = self(
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1076, in forward
    transformer_outputs = self.transformer(
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 900, in forward
    outputs = block(
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 389, in forward
    hidden_states = self.ln_1(hidden_states)
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 190, in forward
    return F.layer_norm(
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

But, I can run this model with torch.float32.

- model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float32)

And, I can run this model with load_in_8bit.

- model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
+ model = AutoModelForCausalLM.from_pretrained(model_name, load_in_8bit=True)

My environment is as follows:

  • OS: Ubuntu 20.04
  • GPU: Core i5-12400F
  • GPU: RTX A6000 48GBx2
  • Memory: 128GB
  • Pip freeze:
    • accelerate==0.23.0
    • bitsandbytes==0.41.1
    • tokenizers==0.14.1
    • transformers==4.34.1
    • torch==2.0.1

What should I do?

OK. I got it.
I run the code on CPU with torch.float16.
For example, according to Mr. Sasaki, I should change the code into the following code:

- model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
+ model = AutoModelForCausalLM.from_pretrained(model_name,  torch_dtype=torch.float16,device_map="auto")

The code run on GPU.
The problem is solved.

In your environment, you need to add an argument device_map="auto" for AutoModelForCausalLM.from_pretrained() and set the os environmental variable CUDA_VISIBLE_DEVICES=0.

By the way, which version of python are you using for this environment?

Thank you for your reply.

My python is Python 3.10.13 with anaconda3.

Thank you for reporting the problem. @alfredplpl
We have revised the sample codes in hf hub.

hiroshi-matsuda-rit changed discussion status to closed

Sign up or log in to comment