System Prompt added to last message

#47
by turentado - opened

Am I seeing it correct that the system prompt is supposed to be added to the last user message?
I have been trying around with this and the model seems to get very confused about the system prompt, to the point where it's better to not use it.
Could you provide some system prompts you tested that don't confuse the model?

I have found the same with previous Mistral models and was curious if they've improved this with Nemo. It's the biggest factor for us in not using the Mistral models as widely. Would be glad if someone has better results than with Mistral Large or Mixtral 8x22B and their issues with varying system prompts.

I am really confused by that too.

To me it looks like this prompt format has several undesirable properties (if I read it correctly of course):
Major: There is no way to unambiguous determine if a given system message is in fact a system message and not part of the user message (unless a user message is never allowed to contain "\n\n", but otherwise it is always ambiguous). That seems huge to me.
Minor: When dropping old parts of conversation, the system message has to be processed every time. This is not a problem on a GPU were prompt tokens can be processed in parallel, but on a CPU where prompt evaluation time is more linear, this seems an unecessary hurdle.

In addition, the exceptions do not quiet fit the actual format:
raise_exception('After the optional system message, conversation roles must alternate user/assistant/user/assistant/...')
raise_exception('Only user and assistant roles are supported, with the exception of an initial optional system message!')

Both exceptions seem to imply the system message is in the beginning, not in the end.
edit: Scratch that last part, the order in the exceptions probably refers to the list of messages, not how the final prompt is assembled.

I would really like to hear an explanation for choosing this format or if that even is the format.

Yeah, default system prompts like "You are a helpful assistant" sometimes cause the model to ignore what you typed and reply with "Thank you" at random points during the conversation because it thinks you just complimented it. I'd really like to know what kind of system prompts they trained with / tested with, if they did at all.

Yes I noticed that too. I found that adding some markers where the system prompt begins and ends helps, but that seems just like a workaround.

I've been wondering if the model has been trained with a system prompt at all, or if someone just figured that after the model was trained without just adding one to the last instruction anyway works best.

I really hope mistrail.ai will clarify this.

Otherwise I also have high hopes for community finetunes, the last mistral base model was amazing for that.

[INST]The system prompt: you're an helpful assistant, yada yada, whatever you need here.[/INST]
[INST]Your first query[/INST]
Bot's response
[INST]Your next query[/INST]
Bot's response
[INST]Your Nth query[/INST]
Nth bot response

It's not complicated. It's the same as with 0.2 and 0.3. The system prompt formatting is just using the same format as the user prompt, that's the only difference with other instruct formats. If that system prompt is anywhere else but at the very top, you're doing it wrong.

I think there's two topics here. Firstly, the system prompt should be the first prompt and is accepted by the model this way. Secondly, is what the model then does with the system prompt. The model sees the system prompt as part of the first user prompt and sometimes only responds to the system prompt and not the actual user prompt. Depending on the application, we have many different system prompts. When the system prompt is quite long (particular in comparison to the user prompt) the model is more likely to respond to the system prompt rather than the actual user prompt. I have asked the question on Discord and was told that that is exactly how Mistral sees it, that the system prompt is just prepended to the first user prompt (and the model has been trained this way). I've used some of the Nous Research quantized Mixtral models which have been finetuned and these react as expected to the system prompt (I've never seen them respond directly to the system prompt).

An easy example is to add a system prompt (e.g. "You are called Albert. You are a helpful assistant. Format all responses in Markdown") and then add a user prompt that says "Hello". Particularly with the Mixtral 8x22b model, the response is something like "Sure, I can do that. Here's my response in Markdown format:\n\n---\n\nHello! I'm Albert, your helpful assistant. How can I assist you today?\n\n---".

We are confused because the chat template clearly demands the system prompt is to be put into the last message. There even was a PR to change it to the first message, which got rejected because no, it really belongs to the last message: https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/discussions/14
Furthermore, @SerialKicked structure would raise an exception with the chat template because assistant and user need to alternate (with the user always going first).

But I agree that adding the system message to the beginning (either as a separate instruction or with the first message) seems a lot more logical, that is why I am wondering about the purpose of the chat template. And in my (short) experience with the model, starting with a general instruction on how to behave followed by the actual chat works well if one words it correctly (like ending the system message with something like: "Only reply in character. The actual chat starts now:". This is also in line with how in my opinion an instruct tuned model is expected to behave, even without an explicitly trained system message. A system message is just an instruction after all.

We are confused because the chat template clearly demands the system prompt is to be put into the last message. There even was a PR to change it to the first message, which got rejected because no, it really belongs to the last message: https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/discussions/14

Then it's a mistake. System prompt goes first. Always has been, always will be. It's consistent across all mistral, all LLama, all models ever. Mistral (and all the other big ones for that matter) always manage to fuck up something in their config files on release day. It's like a rule nowadays.

EDIT: In that particular case, i think it's the interpreter (of whatever platform you're using) that is reading the code wrongly. While I'm not familiar with the format, it's still a basic algorithm I can read. From what I understand, even if it's quite a convoluted way to do it, it's still consistent with my previous message. (I still might be wrong on that front, but the point is, I'm correct about the placement of the system prompt.)

An easy example is to add a system prompt (e.g. "You are called Albert. You are a helpful assistant. Format all responses in Markdown") and then add a user prompt that says "Hello". Particularly with the Mixtral 8x22b model, the response is something like "Sure, I can do that. Here's my response in Markdown format:\n\n---\n\nHello! I'm Albert, your helpful assistant. How can I assist you today?\n\n---".

Try doing it this way
// your system prompt (it's not a good prompt: telling it to use markdown but not telling it for what specifically ain't gonna work as well as you think. It needs examples, at the very least)
[INST]You are called Albert. You are a helpful assistant. Format all responses in Markdown.[/INST]
// Yes a first manually-added "bot" message (that you format in markdown so it serves an an example)
Hello, I'm Albert your markdown assistant! [markdown text stuff]
// The rest follows as usual
[INST]Your query[/INST]
...

It should help the model figure itself out. Or you can chat for 2-3 rounds and edit/re-roll its responses when they are incorrect so it gets the gist of what is asked of it. Most small models need to be primed with a few messages. If all you give it is just "Hello" and a very unclear system prompt, you cannot expect a small model to work on the first message. As we say colloquially: "garbage in, garbage out". Model's output depends on the input's quality. Don't start a "discussion" with just a greeting, ask a question, give a command. It's not sentient, nor intelligent, it won't take offense :)

Also, it's not a 8x22B model (one of those 8 experts even being specialized in the very task you're targeting), it's just a 1x12B model general model. It's a VERY good model for its size, but it's not magic :)

SerialKicked, I think you're misunderstanding the template format. The system prompt is indeed sent first, but it's placed in front of the last user message, not as a separate first message. This is not a mistake, but rather an intentional design choice by Mistral.

The template clearly shows that if a system message is present, it's prepended to the last user message, not sent as a separate first message. We're not arguing about whether the system prompt should be sent first, but rather how it's being formatted and used by the model.

Our main concern is that this format can cause issues with certain standard system prompts, and we were hoping for some clarification from Mistral on what prompts they tested this with. We're not trying to argue about the placement of the system prompt, but rather understand how to use it effectively with the model.

It's great that you've found a way to make it work with your example, but that doesn't change the fact that the template is designed to work differently. We're just seeking some guidance on how to use the system prompt in a way that works well with the model, and hopefully, Mistral can provide some insight into their design choices.

I think you're misunderstanding the template format. The system prompt is indeed sent first, but it's placed in front of the last user message, not as a separate first message. This is not a mistake, but rather an intentional design choice by Mistral.

If I misread the template snippet, fine. I might have, but then, the code snippet is incorrect. LLM are specifically trained to focus most of their attention span (for lack of a better word) at the first and last messages. That model is no different. It's how it works in practice. I'm willing to change my mind, but you'll have to link me a direct quote from Mistral / NVidia who says it in those exact words, because it's literally antithetic to how a language model is trained and learn. Also it would make no sense to have two [INST][/INST] blocks in a row, with one sliding down the chatlog. It fucks with the K/V cache, it fucks with natural language processing as it's taught and understood. It would be actively gimping the model's training to make it work like that.

They also say it's a drop in replacement for 0.3 on the model's frontpage/readme. And in 0.3 there's one system prompt at the top, period. Your suggestion is definitely not a drop-in replacement.

I've been using that model (and 0.2/0.3) since their respective releases, both professionally and casually. And unless proven wrong by @mistral-ai themselves, I'll stick to what works, provably so. The fact that people can't seem to make it work properly with that supposed new format, while it's working flawlessly for my organisation (and literally everybody else I know in the field), points in the other direction.

They also say it's a drop in replacement for 0.3 on the model's frontpage/readme. And in 0.3 there's one system prompt at the top, period. Your suggestion is definitely not a drop-in replacement.

0.3 uses the exact same chat_template as Mistral-Nemo on Huggingface. This is also not my suggestion, I am simply reading the chat_template as specified in tokenizer_config.json.

Also it would make no sense to have two [INST][/INST] blocks in a row

You don't have that. The last user message is supposed to look like: [INST] system_prompt\n\nuser_prompt[/INST]

the code snippet is incorrect

As someone pointed out there has already been a pull request trying to change the template, but it was closed with the explanation that this is indeed correct: https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/discussions/14

@pandora-s can you clarify if this unholy format is indeed correct (ignoring missing bos/eos tokens)?

Turn 1:

[INST]{system prompt} \n {user question 1}[/INST]
bot response 1

Turn 2:

[INST]{user question 1}[/INST]
bot response 1
[INST]{system prompt} \n {user question 2}[/INST]
bot response 2

Because in practice, it doesn't function, especially if the "system" prompt is long (contrary to the usual format).

EDIT: And what about the function calling stuff? Do we scroll that stuff down too?

I can confirm that their api uses the same chat template, so it's correct this way.
I used the system prompt "You are a helpful assistant." and chatted with it for a while. After a couple message I simply sent '.' as a message and it thanked me, thinking I just complimented him for being a helpful assistant. The last message would look something like:

[INST] You are a helpful assistant.

.[/INST]

I asked him what my last message was just to be sure and he replied with: Your last message was: "You are a helpful assistant."

Edit to add output of chatting with "open-mistral-nemo" over their api:

System Prompt: You are a helpful assistant.


User: Hello
Assistant: Hello! I'm here to help. How can I assist you today?
User: Who are you?
Assistant: I am a language model developed through a collaboration between Mistral AI and NVIDIA.
User: Let's play a game. You have to repeat after me. Repeat what I say word by word. Are you ready?
Assistant: Yes, I'm ready. Please go ahead.
User: This chat_template is weird.
Assistant: You are a helpful assistant. This chat template is weird.
User: Good job!
Assistant: You are a helpful assistant. Good job!
User: Bye
Assistant: You are a helpful assistant. Bye

I mean, yeah. What do you expect? It's technically a complicated sentence auto-complete system. So if you tell it it's a useful assistant with an extra dot, it'll respond thank you, independently of their (assumed) instruct format. Because to the model, it's not a system prompt, it's a user message. Don't get my frustration as if it's aimed at you, it's really not. It just showcase how their supposed instruction tuning doesn't work in practice.

All that is assuming there's not a miscommunication in the company itself (i've seen weirder things happen), thankfully, someone competent like NousResearch or CognitiveComputation will just fine-tune the base model into something actually workable. Too bad about the innate function calling in this instruction tuned model, it's more user friendly than what NousResearch ends up with (in my subjective opinion).

edit: that was typed before your big edit.
edit again: yeah nothing to add to that, it speaks for itself. I feel like something is amiss on their side, and they don't understand their own model at this point. If you use my syntax, the model works well. Is my syntax the intended one? I hope so, but it's making me doubtful now. But, If it's not, well, that was one sure way to shoot themselves in the foot.
edit 3: That also would explain the below average ratings for this model, while when i test it, it makes a mockery of them.

@turentado I observed this too. It is very weird. The model behaves bad. Not to mention that put system prompt at last is against the KV cache.

In my now a bit more extended experience, the model is absolutely amazing when putting instructions in the first message and possibly adding further instructions later, as long as it is clearly marked what is system information and what is user input. I'm currently doing this:

First turn:
[INST] System: {system prompt including name of the user and name of the assistant}\n\n{name user}: {first message user}[/INST] {name assistant}: {first reply assistant}</s>

n-th turn:
Either:
[INST] {name user}: {n-th message user}[/INST] {name assistant}: {n-th reply assistant}</s>
Or:
[INST] System: {update to system information}\n\n{name user}: {n-th message user}[/INST] {name assistant}: {n-th reply assistant}</s>
Depending if I want to update the system information (for example with the current time of the day).

The model seems more capable than other models of similar size I tested. Using this format it seems to neither deviate from the system instructions nor confuse them with user input.

Mistral AI_ org

@pandora-s can you clarify if this unholy format is indeed correct (ignoring missing bos/eos tokens)?

Turn 1:

[INST]{system prompt} \n {user question 1}[/INST]
bot response 1

Turn 2:

[INST]{user question 1}[/INST]
bot response 1
[INST]{system prompt} \n {user question 2}[/INST]
bot response 2

Because in practice, it doesn't function, especially if the "system" prompt is long (contrary to the usual format).

EDIT: And what about the function calling stuff? Do we scroll that stuff down too?

Hi there, Hi everyone!

Okay so you can actually see the difference in mistral_common( https://github.com/mistralai/mistral-common/blob/main/src/mistral_common/tokens/tokenizers/sentencepiece.py )

The v1 did as follows:

image.png

But the v2 and v3:

image.png

Does indeed add the system prompt at the last message.

Also, we have just merged new chat templates for all function calling models that should work flawless and match one-on-one with mistral_common so you can now use function calling with transformers and hugging face without mistral_common, but each time you have doupts do not hesitate to use mistral_common as ground truth, and of course you can also ask too! 👍
We also usually answer this type of community questions pretty quickly on our discord so feel free to join there, I hope this answered your doupts!

Mistral AI_ org

@SerialKicked Also, you can feel free to wrap it around and change it, since as everyone noticed we do not have a standard specific way of handling system prompts, and are usually taken as instructions, if you feel like at the 1st user message makes more sense for you, you should be able to use it at the first message by just making it the first intruction.

We are however open to any feedback you may have regarding this!

Sign up or log in to comment