Error "shape '[1, 9, 3072]' is invalid for input of size 36864" while running Gemma 7b using torch.float16

#17
by ahmedsaoudi - opened

Hello,
I'm trying to run the Gemma 7b example from this model's card using torch.float16 but I keep getting shape '[1, 9, 3072]' is invalid for input of size 36864 as an error.

I just copy/pasted the example from the card page to a Google Colab notebook (and installed the necessary dependencies of course).

Am I doing something wrong?

EDIT: tried using 8-bit precision but got the same error.

I got same error on Google Colab with T4.
I found gemma-2b and gemma-2b-it worked, but gemma-7b and gemma-7b-it got error RuntimeError: shape '[1, 9, 3072]' is invalid for input of size 36864.

!pip3 install -q -U bitsandbytes==0.42.0
!pip3 install -q -U peft==0.8.2
!pip3 install -q -U trl==0.7.10
!pip3 install -q -U accelerate==0.27.1
!pip3 install -q -U datasets==2.17.0
!pip3 install -q -U transformers==4.38.0
import os
from google.colab import userdata
os.environ["HF_TOKEN"] = userdata.get('HF_TOKEN')
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(load_in_4bit=True)
# quantization_config = BitsAndBytesConfig(load_in_8bit=True)

model_id = "google/gemma-7b" # gemma-2b and gemma-2b-it worked, but gemma-7b and gemma-7b-it got error
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config)
# model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16)
# model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.float16)
# model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))
Google org

Hey all! We're looking into it! Things work with torch 2.2.0 but not 2.1.0. We'll update here once we find the issue.

Google org
edited Feb 21

Hey all! The source of the code is the difference in the attention implementation. Using any version before 2.1.1 will use eager as sdpa isn't supported in torch in these versions. We will fix the models to work with these versions in transformers ASAP and release a patch; but in the meantime, we recommend using a torch version that satisfies torch>=2.1.1 in order to leverage the sdpa attention implementation, which works correctly.

Here is the necessary line to install the relevant pytorch version in colab:

pip install "torch>=2.1.1" -U

Please restart your runtime afterwards for it to leverage the updated pytorch version!

Hey all! The source of the code is the difference in the attention implementation. Using any version before 2.1.1 will use eager as sdpa isn't supported in torch in these versions. We will fix the models to work with these versions in transformers ASAP and release a patch; but in the meantime, we recommend using a torch version that satisfies torch>=2.1.1 in order to leverage the sdpa attention implementation, which works correctly.

Here is the necessary line to install the relevant pytorch version in colab:

pip install "torch>=2.1.1" -U

Please restart your runtime afterwards for it to leverage the updated pytorch version!

Thank you so much!

https://huggingface.co/google/gemma-7b-it/discussions/13

@osanseviero @lysandre Thank you!

I tested on Google Colab on T4 and confirmed that it works without error by adding this cell at the top of the notebook.

!pip3 install -q -U torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 torchdata==0.7.1 torchtext==0.16.1 --index-url https://download.pytorch.org/whl/cu121

By the way, it seems that the example prompt Write me a poem about Machine Learning. is not suitable for the non-instruct model gemma-7b. Because it generates nonsense output so that it is hard to tell whether it works well or not.

<bos>Write me a poem about Machine Learning.

<bos><bos><bos><bos><bos><bos><bos><bos><bos><bos>

But it actually works well with Write me a poem about Machine Learning. Because.

<bos>Write me a poem about Machine Learning. Because I’m a poet. And I’m
Google org

Hi all! We just did a new release in transformers that fixes the issue being discussed in this thread. Make sure to upgrade. Thanks everyone!

@osanseviero Thank you so much!

I tested on Google Colab (torch 2.1.0+cu121) with transformers==4.38.1, and confirmed example worked well.

!pip3 install -q -U bitsandbytes==0.42.0
!pip3 install -q -U peft==0.8.2
!pip3 install -q -U trl==0.7.10
!pip3 install -q -U accelerate==0.27.1
!pip3 install -q -U datasets==2.17.0
!pip3 install -q -U transformers==4.38.1  # NOT 4.38.0
Google org

Great to hear! I'll close this discussion, but feel free to comment if you still face the issue!

osanseviero changed discussion status to closed

Sign up or log in to comment