Weird Output "emphat emphat emphat"

#82
by bill13031 - opened

Hi there, I tried gemma-7b-it using bfloat16, float16, and float32, they all give out weird but the same outputs.
I've tried with and without pipeline.

My code:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig, pipeline

model = AutoModelForCausalLM.from_pretrained("google/gemma-7b-it", torch_dtype=torch.bfloat16, device_map = 'auto')
tokenizer = AutoTokenizer.from_pretrained("google/gemma-7b-it", use_fast = False)
generation_config = GenerationConfig.from_pretrained("google/gemma-7b-it")

text_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    generation_config=generation_config,
    device_map='auto')

text = "Write me a poem about Machine Learning"

chat = [
    {'role' : 'user', 'content': text}
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
print(prompt)

outputs = text_pipeline(prompt, add_special_tokens=False)
print(outputs)

Output

prompt: 
<bos><start_of_turn>user
Write me a poem about Machine Learning<end_of_turn>
<start_of_turn>model

outputs
[{'generated_text': '<bos><start_of_turn>user\nWrite me a poem about Machine Learning<end_of_turn>\n<start_of_turn>model\n emphat emphat emphat emphat'}]

I downgrade the PyTorch from >=2.2 to <= 1.13, and related packages. Then the above error is solved.
However, when using PyTorch >=2.2, the output is fine if only a GPU is utilized for deploying.
I infer that the error is due to Accelerate.

Sign up or log in to comment