Prompt format

by supportend - opened May 29, 2024

May 29, 2024

Interesting, i used the prompt format, that was in the model card before and it worked very well (system prompt: Comment the source.), but i guess, i it can be placed in the main prompt too with the same effect. Thank you for providing the files.

bartowski

Owner May 29, 2024

Yeah they don't specify a template but then clearly it's meant to be chatted with, I'll update if I find the proper one but I'm glad to hear the default instruct worked, maybe I'll put it back for now

legraphista

May 29, 2024

•

edited May 29, 2024

Confirmed working template:

simple:

<s>[INST] {user_prompt} [/INST] {assistant_response} </s><s>[INST] {new_user_prompt} [/INST]

with system prompt:

<s>[INST] <<SYS>>
{system_prompt}
<</SYS>>

{user_prompt} [/INST] {assistant_response} </s><s>[INST] {new_user_prompt} [/INST]

Hope it helps :)

supportend

May 29, 2024

-p ''
works too in my tests with llama.cpp, but sure it's possible instruct syntax could be better, thanks.

JohanAR

May 31, 2024

•

edited May 31, 2024

Here's how the official Mistral v3 (codestral and mixtral-8x22b) tokenizer handles a fill-in-middle request:

>>> tokenizer.encode_fim(FIMRequest(prompt='hello', suffix='world'))
Tokenized(tokens=[1, 13, 10239, 11, 7080, 29477], text='<s>[SUFFIX]world[PREFIX]▁hello', prefix_ids=None)

Chat is tokenized as expected:

>>> tokenizer.encode_chat_completion(ChatCompletionRequest(messages=[AssistantMessage(content='one'), UserMessage(content='two')]))
Tokenized(tokens=[1, 3, 4, 1392, 2, 3, 1757, 4], text='<s>[INST][/INST]▁one</s>[INST]▁two[/INST]', prefix_ids=None)

However system messages appear to be attached to the last user message, with only two line feeds:

>>> tokenizer.encode_chat_completion(ChatCompletionRequest(messages=[SystemMessage(content='one'), UserMessage(content='two'), AssistantMessage(content='three'), UserMessage(content='four')]))
Tokenized(tokens=[1, 3, 1757, 4, 2480, 2, 3, 1392, 781, 781, 14939, 4], text='<s>[INST]▁two[/INST]▁three</s>[INST]▁one<0x0A><0x0A>four[/INST]', prefix_ids=None)

I.e. the prompt would be something like:

<s> [INST] {user_prompt} [/INST] {assistant_prompt} </s> [INST] {system_prompt}

{user_prompt} [/INST]

But if you're sending a prompt in text format to llama.cpp I think it will add the <s> (BOS) token automatically. The spaces around each special token shouldn't actually be there, but I think at least some tokenizers need them to detect that they are in fact special tokens. Might be a good idea to verify that your prompt template tokenizes correctly by hand.

Multiple system messages are all added to the last user message, each followed by two newlines.

bartowski

Owner May 31, 2024

@JohanAR wow that's how they handle it?? What a strange setup..

i would never expect the system message to get moved during chat, highly unusual

legraphista

May 31, 2024

It's worth asking them maybe. Might be a bug

JohanAR

May 31, 2024

•

edited May 31, 2024

The v1 tokenizer (Mistral 7b, Mixtral 8x7b) adds all system messages to the first:

>>> tokenizer1.encode_chat_completion(ChatCompletionRequest(messages=[SystemMessage(content='system1'), UserMessage(content='user1'), AssistantMessage(content='ass1'), UserMessage(content='user2'), SystemMessage(content='system2')]))
Tokenized(tokens=[1, 733, 16289, 28793, 1587, 28740, 13, 13, 6574, 28750, 13, 13, 1838, 28740, 733, 28748, 16289, 28793, 1155, 28740, 2, 733, 16289, 28793, 2188, 28750, 733, 28748, 16289, 28793],
          text='<s>▁[INST]▁system1<0x0A><0x0A>system2<0x0A><0x0A>user1▁[/INST]▁ass1</s>▁[INST]▁user2▁[/INST]', prefix_ids=None)

Also note that [INST] and [/INST] weren't special tokens in their v1 tokenizer, but they are in v3.

I'm thinking it might be an advantage of having them near the end, as system messages at the top usually seems to have less and less effect the longer the conversation goes on. KV cache shifting algorithms might need to get a little bit more sophisticated to avoid having to reevaluate everything constantly

ManuInNZ

Jun 4, 2024

Noob question I take but how does that prompt template translate into a LM Studio config ?
Is it different from the Mistral Instruct template? https://github.com/lmstudio-ai/configs/blob/main/mistral-instruct.preset.json

{
  "name": "Mistral Instruct",
  "inference_params": {
    "input_prefix": "[INST]",
    "input_suffix": "[/INST]",
    "antiprompt": [
      "[INST]"
    ],
    "pre_prompt_prefix": "",
    "pre_prompt_suffix": ""
  },
  "load_params": {
    "rope_freq_scale": 0,
    "rope_freq_base": 0
  }
}

bartowski

Owner Jun 4, 2024

yeah just use the Mistral Instruct prompt format in LM Studio, it can be used other ways but the [INST] will work nicely for instruction following

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment