Update tokenizer_config.json

#4

The current chat_template adds an extra chatML EOS token when add_generation_prompt=False.
Please replace it with the correct chat_template to fix this behavior.

from transformers import AutoTokenizer
message  = [{"role": "user" , "content": 'How are you?'}]
tame_tokenizer = AutoTokenizer.from_pretrained("yentinglin/Llama-3-Taiwan-8B-Instruct")
tame_tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=False)

You can see an extra <|im_end|> token in the output :

<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nHow are you?<|eot_id|><|im_end|>
minyichen changed pull request title from Upload tokenizer_config.json to Update tokenizer_config.json
yentinglin changed pull request status to merged

Sign up or log in to comment