Stop overgenerating. Need EOS token?

by vicplus - opened Feb 27

Feb 27

Interesting model. It's probably one of the few finetunes using the updated Phi2 base model.

Request: Could you make it stop over-generating? I think that's a matter of adding a unique EOS token? Just like what Dolphin Phi2 did: https://huggingface.co/cognitivecomputations/dolphin-2_6-phi-2/blob/main/tokenizer_config.json#L336 which worked very well.

Congrats on topping the leaderboard (for Phi2)!

PS: props to these two for creating GGUFs for it:
https://huggingface.co/MarsupialAI/aanaphi2-v0.1_GGUF https://huggingface.co/brittlewis12/aanaphi2-v0.1-GGUF

mobicham

Mobius Labs GmbH org Feb 27

Thank you for your message!

It should automatically stop generating at EOS token, just make sure you pass the eos_token_id to the generate call:

outputs = model.generate(**inputs, max_length=max_length, eos_token_id= tokenizer.eos_token_id)

Also, make sure you use the right chat template as specified in the example, this is not using the same logic as the original Phi2 model.

Otherwise, do you have an example where it doesn't stop generating I can take a look at?

Thanks for the GGUFs !

vicplus

Feb 27

Hey!

I'm using the chat template in your tokenizer_config.json and appending <|endoftext|> at the end of every assistant message in my dataset.

I can confirm that (1) my tokenized dataset has token_id 50256 at the end and (2) my finetuned model prints out <|endoftext|> at the end of every response, just like the examples in the dataset.

I will try your suggestion but I also need it to work as a GGUF without providing a stop word in llama.cpp.

vicplus

Feb 27

•

edited Feb 27

I'm not an expert at this, but my theory is that since EOS and BOS share the same special token (<|endoftext|>), then it doesn't really know when to stop?

I didn't have this issue when I finetuned this model: https://huggingface.co/cognitivecomputations/dolphin-2_6-phi-2 and noticed that the EOS and BOS were different (EOS = <|im_end|>. Would you know how they made their model stop generation at the right time?

mobicham

Mobius Labs GmbH org Feb 27

•

edited Feb 27

I am not sure what is the problem exactly. Do you have a problem with the aanaphi2-v0.1 model or with your finetuned model ?

aanaphi2-v0.1 should stop automatically at EOS, no need to add anything and it doesn't have BOS either, so it's simply ### Human: prompt \n ### Assistant: answer. .

If you are trying to finetune your own model, then yes you need to add EOS token manually after each response and make sure the tokenizer doesn't add that again. If you have issues with your finetuned model not predicting the EOS token, that is probably because of the padding token: you need to make sure your padding token is different from EOS token, for example for aanaphi2 it's [PAD], you can do that as follows:

if tokenizer.pad_token is None:
    tokenizer.add_special_tokens({'pad_token': '[PAD]'})
tokenizer.padding_side  = "right"

Normally, when you add new tokens, you also need to extend the embeddings so the new embeddings associated with the new tokens are learned. However, in the special case of the padding token, you don't have to do that because it's ignored in the loss.

I hope my explanation is helpful!

vicplus

Feb 27

Hey again. I'm finetuning your model (aanaphi2-v0.1)

I found out I had this line of code tokenizer.pad_token_id = tokenizer.eos_token_id. Removing that plus following your advice fixed it!

outputs = model.generate(**inputs, max_length=max_length, eos_token_id= tokenizer.eos_token_id)

I'll try GGUF-ing this later and see if it stops properly in llama.cpp. I'll update the thread if I encounter any issues there. Thanks for the help!

mobicham

Mobius Labs GmbH org Feb 27

Ah yes, that line would create an issue for Phi2 models, but interestingly it doesn't for Llama models.
Make sure you also update the chat template of the tokenizer if you use another prompt format.
Glad to hear my comment was helpful!

vicplus

Feb 28

Hi @mobicham !

The finetuned aanaphi2 + GGUF works well, however it's not perfect in my benchmarks.

There seems to be two related bugs, and I'm just wondering if you have any clue what's happening:

"Can you plan my vacation ..." outputs "ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo"
"Give me a 10 day itinerary for a Thailand trip" outputs "Sure, I'll createoooooooooooooooooo an itinerary for you. ..."

It happens 10% of the time. Have you ever seen Phi2 spam a single character?

vicplus

Feb 28

Seems to be a Phi2 thing since dolphin-2_6-phi-2 does the same thing but with the G character (e.g., "GGGGGGGGGGGGGGGGGGGGGGGGGGGGG")

mobicham

Mobius Labs GmbH org Feb 28

I haven't seen this problem before, and it doesn't happen with aanaphi2 as far as I saw. Did you make sure you do model.eval() before using the model? There's a dropout layer that might create some issues.

vicplus

Feb 28

What is dropout layer?

vicplus

Mar 2

Anyway not relevant here, it's mostly a Phi2 thing. Thanks again for the help. Closing the thread.

vicplus changed discussion status to closed Mar 2

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment