The bloom7b model not support contrastive search nor do_sample with peft and just repeating the output

#217

by Imran1 - opened Mar 23, 2023

Mar 23, 2023

Here is code

batch = tokenizer(" څوک د زړه ", return_tensors='pt')

max_length = 200
temperature = 0.5
top_k = 10
top_p = 0.95
do_sample = True
with torch.cuda.amp.autocast():
# Pass the additional parameters to the model.generate() function
output_tokens = model.generate(input_ids=batch["input_ids"],
attention_mask=batch['attention_mask'],
max_length=max_length,
temperature=temperature,
top_k=top_k,
top_p=top_p,repetition_penalty=1.03,
#penalty_alpha=0.6,
#do_sample=do_sample
)

print("\n\n", tokenizer.decode(output_tokens[0], skip_special_tokens=True))

The out put is repeating.

christopher

BigScience Workshop org Mar 23, 2023

I think top_k is way too small. What happens when you use a bigger value?

Imran1

Mar 23, 2023

I play with temperature= 0.5 to 1.0
Also top_k = 4 to 50, and
Top_p= 0.3 to 0.95
But again generate same text

Imran1

Mar 23, 2023

It's give me the following error when I pass do_sample or penalty_alpha parameters.

RuntimeError: "topk_cpu" not implemented for 'Half'

christopher

BigScience Workshop org Mar 24, 2023

Ah, I just noticed the details in your title. top_k and top_p and temperature are not used when do_sample is False, so you're just generating deterministically even when setting values. I don't know how to make it work with PEFT. Maybe @ybelkada can help?

Imran1

Mar 24, 2023

@cakiki You mean use top_p and do_sample not use temperature !

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment