I am facing this minor issue with Llama 3, the eos_token was not correct, it makes the model answer multiple lines of code.
So, by changing this eos_token I was able to stop the overflow of model response.

@pcuenq kindly review

seems like , this fix the issue of overflow of model response.

work like a charm for me while doing prompt chat completion

Meta Llama org

Mmm that's weird it should be changed in the config not here no?

@ArthurZ but while using the prompt templates i.e, tokenizer.apply_chat_template, the assitant answers end with this token id <|eot_id|> , So that why its working.

I was also facing the issue of overflow of response, this simple change seems to fix the issue.

I was adding this line in my code.
tokenizer.eos_token ='<|eot_id|>'

But changing the tokenizer_config as suggested here fixed the code. Now no need to add the above line.

This comment has been hidden

Why aren't these things being merged? Why would you spend a billion dollars training these models to release and leave them in such a half-baked state? Does config.json need changing? Does tokenizer_config.json need changing? Do both need changing? Do neither need changing? If anything needs changing, why hasn't it? Frustrating

@philschmid @ArthurZ kindly look into this

ArthurZ changed pull request status to merged
Meta Llama org

This should fix it, I am not sure why it's so frustrating for everyone, it's a parameter that's super easy to change 😅
Sorry all for the troubles it caused!

It's frustrating because anyone downloading the model without reading these comments is gonna have a bad time. Even someone who reads these comments, like myself, has no idea which, if any, of these abandoned pull requests have merit. Your final solution didn't even have a pull request. So it really has nothing to do with the difficulty of changing the parameter. At the very least, we need official guidance as to what to change. Thanks for the fixes

Also, does generation_config.json then need updating on any/all of the models? special_tokens_map.json? I wish you guys would just do a thorough review of all the files for all four models and make the necessary updates. Far too much ambiguity and confusion

This should fix it, I am not sure why it's so frustrating for everyone, it's a parameter that's super easy to change 😅
Sorry all for the troubles it caused!

If this has to be changed every single time the model is run, this is a bug.
Why is this not being treated as a bug?

Also, it creates problems because the strength of hf is to work out of the box.

Hello world :) !
Thank you for these updates. For my case, I updated the tokenizer config as mentioned but always getting multiple lines with the same output (the first answer from the assistant but after it loops on the input system prompt until having the max_length new generated tokens.)
Many informations and different ones ! I'm a bit lost, do you have a clear code example to see if I'm wrongly using the model please ?
Regards

PS : @ArthurZ

Here is my code :

messages = [
{"role": "system", "content": "You are the best chatbot and your name is ESG-IGL"},
{"role": "user", "content": "Who are you?"},
]

input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)

terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
input_ids,
max_new_tokens=64,
eos_token_id=terminators,
do_sample=False,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

Note : even when I set skip_special_tokens to False, the Output is the same.

My output :

What is your name?ESG-IGL.…

You are the best chatbot and your name is ESG-IGL.…

You are the best chatbot and your name is ESG-IGL.…

You are the best chatbot and your name is ESG-IGL.…

You are the best chat

Sign up or log in to comment