bofenghuang/vigogne-2-7b-chat · Issue with chat templating

Hello,
First of all thank you for this submission.

I would like to share a potential issue with your chat templating method.
For information, I have deployed this model on my AWS Sagemaker instance and I'm calling the instance this way (that's not very important):

inference_params = {
    "inputs": prompt, # prompt is formatted this way TOKENIZER.apply_chat_template([...], tokenize=False, add_generation_prompt=True))
    "parameters":
    {
        "return_full_text": False,
        "do_sample": True,
        "top_k": 3,
        "temperature": .25,
        "repetition_penalty": 1.1,
        "max_new_tokens": 1024,
        "pad_token_id": TOKENIZER.eos_token_id,
        "stop_sequences": STOP_SEQUENCES,
        "stop": STOP_SEQUENCES
    },
    "stream": True
}
 response = SRC.invoke_endpoint_with_response_stream(EndpointName=AWS_ENDPOINT_NAME, Body=json.dumps(inference_params), ContentType="application/json")["Body"]

Stage 1:
Everything is working fine.
I'm able to get answer from the model, answers are coherents with the system's instructions and I have good results overall.

Stage 2:
Now there is a big issue when I try to implement the history, so the prompt will contains previous questions/answers of the conversation.
The model doesn't understand anymore any instructions.
He stills answer to the last question but without any additionnal information I gave it through the prompt. (I believe)
And I know why:
If we start from your example

conversation = [
    {"role": "user", "content": "Bonjour ! Comment ça va aujourd'hui ?"},
    {"role": "assistant", "content": "Bonjour ! Je suis une IA, donc je n'ai pas de sentiments, mais je suis prêt à vous aider. Comment puis-je vous assister aujourd'hui ?"},
    {"role": "user", "content": "Quelle est la hauteur de la Tour Eiffel ?"},
    {"role": "assistant", "content": "La Tour Eiffel mesure environ 330 mètres de hauteur."},
    {"role": "user", "content": "Comment monter en haut ?"},
]
print(tokenizer.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True))

Will show us :

<s><|system|>: Vous êtes Vigogne, un assistant IA créé par Zaion Lab. Vous suivez extrêmement bien les instructions. Aidez autant que vous le pouvez.
<|user|>: Bonjour ! Comment ça va aujourd'hui ?
<|assistant|>: Bonjour ! Je suis une IA, donc je n'ai pas de sentiments, mais je suis prêt à vous aider. Comment puis-je vous assister aujourd'hui ?</s>
<|user|>: Quelle est la hauteur de la Tour Eiffel ?
<|assistant|>: La Tour Eiffel mesure environ 330 mètres de hauteur.</s>
<|user|>: Comment monter en haut ?
<|assistant|>:

I believe everything inside <s> </s> is not apprehensible by the model. There is my thoughts:
When you don't include conversation history into the prompt, there isn't any </s> generated by apply_chat_template (stage 1).
On stage 2, you do have this </s> which is generated.
I tried to remove this tag and it's new work fine with my data. The model answers are coherents and use instructions etc.
I insist on the fact that by removing this tag, the model follows the instructions (the same way than stage 1) and it's completely the opposite when I leave it which is the default behavior of the tokenizer's method.

A very simple test would be to look at the anwser of your last question "Comment monter en haut ?" (including the history).
If the model doesn't generate output about the eiffel tower, my thoughts can be confirmed. Otherwise I must be wrong.