yentinglin/Llama-3-Taiwan-8B-Instruct · Update tokenizer

Jun 26

The current chat_template adds an extra chatML EOS token when add_generation_prompt=False.
Please replace it with the correct chat_template to fix this behavior.

from transformers import AutoTokenizer
message  = [{"role": "user" , "content": 'How are you?'}]
tame_tokenizer = AutoTokenizer.from_pretrained("yentinglin/Llama-3-Taiwan-8B-Instruct")
tame_tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=False)

You can see an extra <|im_end|> token in the output :

<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nHow are you?<|eot_id|><|im_end|>

Upload tokenizer_config.json263d7ce5

minyichen changed pull request title from Upload tokenizer_config.json to Update tokenizer_config.json Jun 26

Update tokenizer_config.json72d02df0

Update special_tokens_map.jsonf7f8f66c

yentinglin changed pull request status to merged Jul 1