tokenizer.apply_chat_template() appends wrong tokens after the update

#81
by alestolfo - opened

I've been using tokenizer.apply_chat_template() to format the input as "<|user|> {my content}<|end|><|assistant|>". Since the model update, I noticed that the output of the model started to be nonsensical and unrelated to the input. After some investigation, I realized that apply_chat_template now appends to the prompt <|endoftext|> instead of <|assistant|>. Getting rid of this function and manually formatting the input solves the problem. I’m curious how this update caused the problem and wanted to share in case others come across the same issue.

Hi, please add add_generation_prompt=True when calling tokenizer.apply_chat_template(). The prompt will end with <|assistant|> instead of <|endoftext|>.

nguyenbh changed discussion status to closed

Sign up or log in to comment