Update chat_template on tokenizer_config

#14
by ironrock - opened

This PR corrects the chat_template in the tokenizer config. The issue was that the system role was not included correctly. It should be appended before the first user message, not before the last one.

Mistral AI_ org

The chat template is copied from https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/commit/43ee8f4afb6fc9e4304a8ed87aaa3a36a0e06939. @Rocketknight1 can you review this and see if the update is valid?

Mistral AI_ org

Only the v1 of the tokenizers appends to the first message, as you can see here: https://github.com/mistralai/mistral-common/blob/main/src/mistral_common/tokens/tokenizers/sentencepiece.py
v2 and v3 both append to the last message, so the current one seems correct. Here is the output of mistral_common for v3 with tekken:

<s>[INST]User[/INST]Assistant</s>[INST]System

User[/INST]

But how can we create a simple prompt with a system/user/assistant structure? In this case, the system message would be skipped and only appear if the last role is a user which doesn't make sense in most use cases.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment