Cannot run model with torch.float16

by alfredplpl - opened Oct 20, 2023

Discussion

alfredplpl

Oct 20, 2023

•

edited Oct 20, 2023

I cannot run this model with torch.float16. And, the load speed is slower.

I ran the following code:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name="llm-jp/llm-jp-13b-instruct-full-jaster-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16) 
text = '自然言語処理とは何か' 
text = text + "### 回答："
tokenized_input = tokenizer.encode(text, add_special_tokens=False, return_tensors="pt").to(model.device)
with torch.no_grad():
    output = model.generate(
        tokenized_input,
        max_new_tokens=256,
        do_sample=True,
        top_p=0.95,
        temperature=0.7,
    )[0]
print(tokenizer.decode(output))

Then, I got the following error:

/home/username/anaconda3/envs/pdf-agent/bin/python /mnt/my_raid/github/pdf-agent/llm_jp_13b.py 
Loading checkpoint shards: 100%|██████████| 3/3 [00:08<00:00,  2.79s/it]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:7 for open-end generation.
Traceback (most recent call last):
  File "/mnt/my_raid/github/pdf-agent/llm_jp_13b.py", line 10, in <module>
    output = model.generate(
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/transformers/generation/utils.py", line 1652, in generate
    return self.sample(
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/transformers/generation/utils.py", line 2734, in sample
    outputs = self(
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1076, in forward
    transformer_outputs = self.transformer(
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 900, in forward
    outputs = block(
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 389, in forward
    hidden_states = self.ln_1(hidden_states)
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 190, in forward
    return F.layer_norm(
  File "/home/username/anaconda3/envs/pdf-agent/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

But, I can run this model with torch.float32.

- model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float32)

And, I can run this model with load_in_8bit.

- model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
+ model = AutoModelForCausalLM.from_pretrained(model_name, load_in_8bit=True)

My environment is as follows:

OS: Ubuntu 20.04
GPU: Core i5-12400F
GPU: RTX A6000 48GBx2
Memory: 128GB
Pip freeze:
- accelerate==0.23.0
- bitsandbytes==0.41.1
- tokenizers==0.14.1
- transformers==4.34.1
- torch==2.0.1

What should I do?

alfredplpl

Oct 20, 2023

OK. I got it.
I run the code on CPU with torch.float16.
For example, according to Mr. Sasaki, I should change the code into the following code:

- model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
+ model = AutoModelForCausalLM.from_pretrained(model_name,  torch_dtype=torch.float16,device_map="auto")

The code run on GPU.
The problem is solved.

hiroshi-matsuda-rit

LLM-jp org Oct 20, 2023

In your environment, you need to add an argument device_map="auto" for AutoModelForCausalLM.from_pretrained() and set the os environmental variable CUDA_VISIBLE_DEVICES=0.

By the way, which version of python are you using for this environment?

alfredplpl

Oct 20, 2023

Thank you for your reply.

My python is Python 3.10.13 with anaconda3.

hiroshi-matsuda-rit

LLM-jp org Oct 21, 2023

Thank you for reporting the problem. @alfredplpl
We have revised the sample codes in hf hub.

hiroshi-matsuda-rit changed discussion status to closed Oct 21, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment