Update tokenizer_config.json

#60

by Navanit-shorthills - opened Apr 24

base: refs/heads/main

←

from: refs/pr/60

Discussion Files changed

-1

Navanit-shorthills

Apr 24

I am facing this minor issue with Llama 3, the eos_token was not correct, it makes the model answer multiple lines of code.
So, by changing this eos_token I was able to stop the overflow of model response.

Update tokenizer_config.json81e3437a

Navanit-shorthills

Apr 24

@pcuenq kindly review

kunal88

Apr 25

seems like , this fix the issue of overflow of model response.

amit44

Apr 25

work like a charm for me while doing prompt chat completion

ArthurZ

Meta Llama org Apr 25

Mmm that's weird it should be changed in the config not here no?

Navanit-shorthills

Apr 25

@ArthurZ but while using the prompt templates i.e, tokenizer.apply_chat_template, the assitant answers end with this token id <|eot_id|> , So that why its working.

Sakshi11

Apr 26

I was also facing the issue of overflow of response, this simple change seems to fix the issue.

Akshay47

Apr 26

•

edited Apr 26

I was adding this line in my code.
tokenizer.eos_token ='<|eot_id|>'

But changing the tokenizer_config as suggested here fixed the code. Now no need to add the above line.

navanit

Apr 26

This comment has been hidden

oldmanhuggingface

23 days ago

•

edited 23 days ago

Why aren't these things being merged? Why would you spend a billion dollars training these models to release and leave them in such a half-baked state? Does config.json need changing? Does tokenizer_config.json need changing? Do both need changing? Do neither need changing? If anything needs changing, why hasn't it? Frustrating

Navanit-shorthills

23 days ago

@philschmid @ArthurZ kindly look into this

ArthurZ changed pull request status to merged 23 days ago

ArthurZ

Meta Llama org 23 days ago

This should fix it, I am not sure why it's so frustrating for everyone, it's a parameter that's super easy to change 😅
Sorry all for the troubles it caused!

oldmanhuggingface

23 days ago

It's frustrating because anyone downloading the model without reading these comments is gonna have a bad time. Even someone who reads these comments, like myself, has no idea which, if any, of these abandoned pull requests have merit. Your final solution didn't even have a pull request. So it really has nothing to do with the difficulty of changing the parameter. At the very least, we need official guidance as to what to change. Thanks for the fixes

oldmanhuggingface

23 days ago

I thanked too soon. Please see my comment here - https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct/discussions/33#663e87a825d363a5c3ee2290

oldmanhuggingface

23 days ago

•

edited 23 days ago

Also, does generation_config.json then need updating on any/all of the models? special_tokens_map.json? I wish you guys would just do a thorough review of all the files for all four models and make the necessary updates. Far too much ambiguity and confusion

epignatelli

15 days ago

This should fix it, I am not sure why it's so frustrating for everyone, it's a parameter that's super easy to change 😅
Sorry all for the troubles it caused!

If this has to be changed every single time the model is run, this is a bug.
Why is this not being treated as a bug?

Also, it creates problems because the strength of hf is to work out of the box.

You-Py

6 days ago

•

edited 6 days ago

Hello world :) !
Thank you for these updates. For my case, I updated the tokenizer config as mentioned but always getting multiple lines with the same output (the first answer from the assistant but after it loops on the input system prompt until having the max_length new generated tokens.)
Many informations and different ones ! I'm a bit lost, do you have a clear code example to see if I'm wrongly using the model please ?
Regards

PS : @ArthurZ

Here is my code :

messages = [
{"role": "system", "content": "You are the best chatbot and your name is ESG-IGL"},
{"role": "user", "content": "Who are you?"},
]

input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)

terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

outputs = model.generate(
input_ids,
max_new_tokens=64,
eos_token_id=terminators,
do_sample=False,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

Note : even when I set skip_special_tokens to False, the Output is the same.

My output :

What is your name?ESG-IGL.…

You are the best chatbot and your name is ESG-IGL.…

You are the best chat

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment