configuration / penalty to lower repetition?

#32

by mfab - opened May 24, 2023

mfab

May 24, 2023

•

edited May 24, 2023

I was wondering if there was temperature config or penalty setting to lower the probability of repetition while running from HuggingFace's api? I was trying to generate a dialogue between people and the output looks like:

Person1: I don't know what to say.
Person2: I don't know what to say either.
Person1: I don't know what to say.
Person2: I don't know what to say either.
Person1: I don't know what to say.
Person2: I don't know what to say either.
Person1: I don't know what to say.
Person2: I don't know what to say either.
Person1: I don't know what to say.
Person2: I don't know what to say either.
Person1: I don't know what to say.
Person2: I don't know what to say either.
Person1: I don't know what to say.
Person2: I don't know what to say either.
Person1: I don't know what to say.
Person2: I don't know what to say either.

Here's my current code. I see the options to change temperature and penalty if you were running this from cli and downloaded the entire repo. Using the Huggingface api, would I be changing the penalty in the config section?

torch.cuda.set_per_process_memory_fraction(0.25)
torch.cuda.empty_cache()

model_name = "mosaicml/mpt-7b-instruct

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neox-20b")
config = transformers.AutoConfig.from_pretrained(
model_name,
  trust_remote_code=True
)
config.attn_config['attn_impl'] = 'torch'


model = transformers.AutoModelForCausalLM.from_pretrained(
  'mosaicml/mpt-7b-instruct',
  config=config,
  trust_remote_code=True,
  torch_dtype=torch.bfloat16,
)
model.to(device='cuda:3')


INSTRUCTION_KEY = "### Instruction:"
RESPONSE_KEY = "### Response:"
INTRO_BLURB = "Below is an instruction that describes a task. Write a response that appropriately completes the request."
PROMPT_FOR_GENERATION_FORMAT = """{intro}
{instruction_key}
{instruction}
{response_key}
""".format(
    intro=INTRO_BLURB,
    instruction_key=INSTRUCTION_KEY,
    instruction="{instruction}",
    response_key=RESPONSE_KEY,
)
 example = "Write dialogue for two people are a party and meet for the first time. They're both shy and hesitant about starting a conversation. Write 5 lines of dialogue between these two people:"

fmt_ex = PROMPT_FOR_GENERATION_FORMAT.format(instruction=example)


model_inputs = tokenizer(text=fmt_ex, return_tensors="pt").to("cuda:3")

output_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024,
)
output_text = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0]
print(output_text)

mfab changed discussion title from configuration / penalty to lower repetition to configuration / penalty to lower repetition? May 24, 2023

kdua

May 24, 2023

•

edited May 24, 2023

Update your code to the following

output_ids = model.generate(
**model_inputs,
max_new_tokens=1024,
repetition_penalty=1.1
)

datacow

May 24, 2023

HF has a list of generation arguments you can play with: https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig
I don't think all of them are compatible with MPT, but temperature and repetition penalty are 2 relevant ones to look at. Here's how to use them:

generation_config = GenerationConfig.from_pretrained("mosaicml/mpt-7b-instruct")
generation_config.temperature = 0.7
generation_config.repetition_penalty = 1.1

model.generate(**inputs, generation_config)

mfab

May 24, 2023

Thanks @kdua and @datacow !

Does it look like to you that 'transformers.AutoConfig.from_pretrained()' is being replaced by 'GenerationConfig.from_pretrained()' to pass in generation strategy configs ?

repetition_penalty (float, optional, defaults to 1.0) — The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details. I can't quite tell from the paper whether higher percentage mean more penalty if 1.0 is no penalty.

datacow

May 25, 2023

•

edited May 25, 2023

Does it look like to you that 'transformers.AutoConfig.from_pretrained()' is being replaced by 'GenerationConfig.from_pretrained()' to pass in generation strategy configs ?

@mfab So AutoConfig will determine the configuration settings for the model when you load it. AutoConfig will contain things like model data type, attention implementation, etc. You can edit the default model config and pass it as an argument to transformers.AutoModelForCausalLM.from_pretrained() to load the model with your preferred settings. GenerationConfig only governs the settings at inference time when you call model.generate(), so it doesn't interfere with AutoConfig.

repetition_penalty (float, optional, defaults to 1.0) — The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details. I can't quite tell from the paper whether higher percentage mean more penalty if 1.0 is no penalty.

Here's an extract from a different link (https://huggingface.co/transformers/v2.11.0/main_classes/model.html#transformers.PreTrainedModel.generate) with a bit clearer explanation: "The parameter for repetition penalty. Between 1.0 and infinity. 1.0 means no penalty. Default to 1.0." i.e. Anything bigger than 1 adds a penalty for repetition.

abhi-mosaic

Jun 3, 2023

Closing as stale

abhi-mosaic changed discussion status to closed Jun 3, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment