4bit-quantized gemma-2-27b-it generates only pad tokens, like '<pad><pad><pad><pad><pad><pad><pad><pad><pad>'.

#29

by kshinoda - opened Jul 17, 2024

Discussion

kshinoda

Jul 17, 2024

•

edited Jul 17, 2024

Thank you for releasing the great models!

I found that this model (gemma-2-27b-it) seems to generate only PAD tokes in my environment when using 4-bit quantization.
My environment and codes are as follows.

How should this issue be fixed?
Thanks for your support in advance.

torch==2.3.0+cu118
transformers==4.42.4
bitsandbytes==0.43.1
CUDA==11.6

from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer
kwargs = {'device_map': 'auto'}
kwargs['quantization_config'] = BitsAndBytesConfig(
    load_in_4bit=True
)
model = AutoModelForCausalLM.from_pretrained('google/gemma-2-27b-it', low_cpu_mem_usage=True, **kwargs)
tokenizer = AutoTokenizer.from_pretrained('google/gemma-2-27b-it', use_fast=False, padding_side='right')

chat = [
    {'role': 'user', 'content': 'Hello!'},
]

prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

inputs = tokenizer([prompt], add_special_tokens=False, padding=True, truncation=True, return_tensors="pt")
inputs = {k: inputs[k].to('cuda') for k in inputs}

outputs = model.generate(**inputs)

tokenizer.decode(outputs[0].cpu().numpy().tolist())

and this is the output

'<bos><start_of_turn>user\nHello!<end_of_turn>\n<start_of_turn>model\n<pad><pad><pad><pad><pad><pad><pad><pad><pad>'

kshinoda changed discussion title from It generates only pad tokens, like '<pad><pad><pad><pad><pad><pad><pad><pad><pad>'. to 4bit-quantized gemma-2-27b-it generates only pad tokens, like '<pad><pad><pad><pad><pad><pad><pad><pad><pad>'. Jul 17, 2024

Jaume

Jul 17, 2024

Just add that I'm facing the same issue with while using 8-bit quantization.

luanagbmartins

Jul 19, 2024

Same here with 4-bit quantization too.

mdouglas

Aug 1, 2024

Hi all. Please use torch_dtype=torch.bfloat16 when loading with from_pretrained(). There's a PR to update the model card examples here: #33.

lkv

Google org Dec 4, 2024

Hi @kshinoda , I hope this issue has been resolved after using torch_dtype=torch.bfloat16. Could you please confirm if you have any concerns otherwise will close this issue. Thank you.

Jaume

Dec 4, 2024

Hello @lkv , I was having the same issue and worked for me, thank you for checking!

kshinoda

Dec 4, 2024

Hello @lkv , this issue has been resolved for me as well. I will close this issue.
Thank you all for the responses!

kshinoda changed discussion status to closed Dec 4, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment