Bloomz-3b Refuses to Summarize Text

#40

by Kato-22 - opened Apr 9, 2023

Apr 9, 2023

I read that Bloomz was good for summarization tasks compared to the regular bloom model.
However, based on my experience, it refuses to summarize the text. I have tried several prompts to no avail, it always returns the input text.
Has anyone had success with abstraction summarization with Bloomz?

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

device = 'cuda:0' if torch.cuda.is_available() else 'cpu'

tokenizer = AutoTokenizer.from_pretrained("bigscience/bloomz-3b")
model = AutoModelForCausalLM.from_pretrained("bigscience/bloomz-3b", torch_dtype='auto', device_map='auto', offload_folder='offload', offload_state_dict=True)

text = "In this lesson, we will learn about the different types of clouds that exist in the atmosphere. Clouds are classified based on their height, shape, and composition. The three main cloud types are cumulus, stratus, and cirrus. Cumulus clouds are puffy and white, with a flat base and a rounded top. Stratus clouds are low, gray, and flat, and they often cover the entire sky. Cirrus clouds are high and thin, with a wispy appearance. They are often an indicator of an approaching storm. Understanding the different cloud types is important for weather forecasting and aviation safety."

inputs = tokenizer.encode(f'Write a brief summary for the following text that focuses on the main idea:\n\n{text}', return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=64)

summary = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(summary)

Output:

Write a brief summary for the following text that focuses on the main idea:

In this lesson, we will learn about the different types of clouds that exist in the atmosphere. Clouds are classified based on their height, shape, and composition. The three main cloud types are cumulus, stratus, and cirrus. Cumulus clouds are puffy and white, with a flat base and a rounded top. Stratus clouds are low, gray, and flat, and they often cover the entire sky. Cirrus clouds are high and thin, with a wispy appearance. They are often an indicator of an approaching storm. Understanding the different cloud types is important for weather forecasting and aviation safety.

Muennighoff

BigScience Workshop org Apr 9, 2023

It's not returning the input text, but directly finishing the generation after your prompt. You can see that by setting skip_special_tokens=False.

It's best to have the instruction after the context, see the below generation I got with bloomz-3b:

inputs = tokenizer.encode(f'{text}\n\nWrite a short summary for the prior text.', return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=64)

summary = tokenizer.decode(outputs[0], skip_special_tokens=False)

print(summary)

Learn about different types of clouds.

Kato-22

Apr 10, 2023

Thanks, adding the instruction after the context helped. However, it looks like Bloomz is doing extractive summarization rather than abstractive summarization.
I tried several prompts such as:

What are the main points of the prior text?
What was the prior text about?
Summarize the prior text.
Write a brief summary of the prior text that highlights its key points.
Write a summary of the prior text that highlights its key points.

But all of them returned basically the same "Learn about different types of clouds." output (the last two prompts changed "learn" to "identify").
Even using a larger model (bigscience/bloomz-7b1) yielded the same results.

How can use Bloomz for abstractive summarization?

Muennighoff

BigScience Workshop org Apr 10, 2023

You could give it a few-shot example. E.g. provide it one example with an abstractive summary in the beginning of your prompt.

lixiqi

Jul 2, 2023

It's not returning the input text, but directly finishing the generation after your prompt. You can see that by setting skip_special_tokens=False.

It's best to have the instruction after the context, see the below generation I got with bloomz-3b:
inputs = tokenizer.encode(f'{text}\n\nWrite a short summary for the prior text.', return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=64)

summary = tokenizer.decode(outputs[0], skip_special_tokens=False)

print(summary)
Learn about different types of clouds.

@Muennighoff Hi, I have the same problem and tried with skip_special_tokens=False but the prompt is still printed, is there any changes to the instruction?

Muennighoff

BigScience Workshop org Jul 2, 2023

It's not returning the input text, but directly finishing the generation after your prompt. You can see that by setting skip_special_tokens=False.

It's best to have the instruction after the context, see the below generation I got with bloomz-3b:
inputs = tokenizer.encode(f'{text}\n\nWrite a short summary for the prior text.', return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=64)

summary = tokenizer.decode(outputs[0], skip_special_tokens=False)

print(summary)
Learn about different types of clouds.
@Muennighoff Hi, I have the same problem and tried with skip_special_tokens=False but the prompt is still printed, is there any changes to the instruction?

If it's just generating the endoftext token i.e. it does not generate anything and changing the prompt does not help you can enforce a minimum number of tokens by setting min_new_tokens to a value larger than 0, which will just ignore the endoftext token for x number of tokens.

lixiqi

Jul 3, 2023

Thank you. Sorry I didn't describe my problem clearly. In my case, the output is prompt + generate,
for example the output is
"here is an example.
Generate a summmary: This is an example.
Test sentence .....
Generate a summmary: This is a test sentence."

What I want is only :"This is a test sentence."

Muennighoff

BigScience Workshop org Jul 3, 2023

Oh that is just the default behavior of generate to also return the input prompt - the model is not actually generating that part.
You can just remove it by doing sth like gen = gen[len(prompt):]

lixiqi

Jul 3, 2023

Thank you for your answer! it helps. :)
another question: if the example is in English and the test sentence in another language, is there any method to solve the problem that the generation is always in English?

Muennighoff

BigScience Workshop org Jul 3, 2023

Explicitly specifying the language may work, e.g. Please reply in Japanese.

lixiqi

Jul 3, 2023

I will try this method, thank you for your help

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment