Multiple eos tokens in chat template.

#95
by BobbertWobbert - opened

Noticed that there is a question concerning multiple eos characters in the chat template that was not fully answered in this post.

The chat template adds eos characters after every pair of user-assistant messages. Just want to confirm that this is intended and not a bug in the chat template? Thanks.

Example:

print(tokenizer.apply_chat_template([{'role':'user','content':'inst1'},{'role':'assistant','content':'aswer1'}, {'role':'user','content':'inst2'},{'role':'assistant','content':'answ2'}],tokenize=False,add_generation_prompt=False))
<s>[INST] inst1 [/INST]aswer1</s> [INST] inst2 [/INST]answ2</s>

My understanding is </s> marks the end of a response, or we don't know when to stop during inference.
</s>

Sign up or log in to comment