Error "shape '[1, 9, 3072]' is invalid for input of size 36864" while running Gemma 7b using torch.float16

#17

by ahmedsaoudi - opened Feb 21, 2024

Feb 21, 2024

•

edited Feb 21, 2024

Hello,
I'm trying to run the Gemma 7b example from this model's card using torch.float16 but I keep getting shape '[1, 9, 3072]' is invalid for input of size 36864 as an error.

I just copy/pasted the example from the card page to a Google Colab notebook (and installed the necessary dependencies of course).

Am I doing something wrong?

EDIT: tried using 8-bit precision but got the same error.

susumuota

Feb 21, 2024

I got same error on Google Colab with T4.
I found gemma-2b and gemma-2b-it worked, but gemma-7b and gemma-7b-it got error RuntimeError: shape '[1, 9, 3072]' is invalid for input of size 36864.

!pip3 install -q -U bitsandbytes==0.42.0
!pip3 install -q -U peft==0.8.2
!pip3 install -q -U trl==0.7.10
!pip3 install -q -U accelerate==0.27.1
!pip3 install -q -U datasets==2.17.0
!pip3 install -q -U transformers==4.38.0

import os
from google.colab import userdata
os.environ["HF_TOKEN"] = userdata.get('HF_TOKEN')

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True)
# quantization_config = BitsAndBytesConfig(load_in_8bit=True)

model_id = "google/gemma-7b" # gemma-2b and gemma-2b-it worked, but gemma-7b and gemma-7b-it got error
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config)
# model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16)
# model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.float16)
# model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

osanseviero

Google org Feb 21, 2024

Hey all! We're looking into it! Things work with torch 2.2.0 but not 2.1.0. We'll update here once we find the issue.

lysandre

Google org Feb 21, 2024

•

edited Feb 21, 2024

Hey all! The source of the code is the difference in the attention implementation. Using any version before 2.1.1 will use eager as sdpa isn't supported in torch in these versions. We will fix the models to work with these versions in transformers ASAP and release a patch; but in the meantime, we recommend using a torch version that satisfies torch>=2.1.1 in order to leverage the sdpa attention implementation, which works correctly.

Here is the necessary line to install the relevant pytorch version in colab:

pip install "torch>=2.1.1" -U

Please restart your runtime afterwards for it to leverage the updated pytorch version!

ahmedsaoudi

Feb 21, 2024

Hey all! The source of the code is the difference in the attention implementation. Using any version before 2.1.1 will use eager as sdpa isn't supported in torch in these versions. We will fix the models to work with these versions in transformers ASAP and release a patch; but in the meantime, we recommend using a torch version that satisfies torch>=2.1.1 in order to leverage the sdpa attention implementation, which works correctly.

Here is the necessary line to install the relevant pytorch version in colab:
pip install "torch>=2.1.1" -U
Please restart your runtime afterwards for it to leverage the updated pytorch version!

Thank you so much!

NickyNicky

Feb 21, 2024

https://huggingface.co/google/gemma-7b-it/discussions/13

susumuota

Feb 21, 2024

@osanseviero @lysandre Thank you!

I tested on Google Colab on T4 and confirmed that it works without error by adding this cell at the top of the notebook.

!pip3 install -q -U torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 torchdata==0.7.1 torchtext==0.16.1 --index-url https://download.pytorch.org/whl/cu121

By the way, it seems that the example prompt Write me a poem about Machine Learning. is not suitable for the non-instruct model gemma-7b. Because it generates nonsense output so that it is hard to tell whether it works well or not.

<bos>Write me a poem about Machine Learning.

<bos><bos><bos><bos><bos><bos><bos><bos><bos><bos>

But it actually works well with Write me a poem about Machine Learning. Because.

<bos>Write me a poem about Machine Learning. Because I’m a poet. And I’m

osanseviero

Google org Feb 22, 2024

Hi all! We just did a new release in transformers that fixes the issue being discussed in this thread. Make sure to upgrade. Thanks everyone!

susumuota

Feb 22, 2024

•

edited Feb 22, 2024

@osanseviero Thank you so much!

I tested on Google Colab (torch 2.1.0+cu121) with transformers==4.38.1, and confirmed example worked well.

!pip3 install -q -U bitsandbytes==0.42.0
!pip3 install -q -U peft==0.8.2
!pip3 install -q -U trl==0.7.10
!pip3 install -q -U accelerate==0.27.1
!pip3 install -q -U datasets==2.17.0
!pip3 install -q -U transformers==4.38.1  # NOT 4.38.0

osanseviero

Google org Feb 22, 2024

Great to hear! I'll close this discussion, but feel free to comment if you still face the issue!

osanseviero changed discussion status to closed Feb 22, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment