Prompt Format?

#3
by gardner - opened

Thank you for publishing these weights. The base model is impressive. I am keen to try out the instruct tuned model.

What is the prompt format?

CodeLlama 7B Instruct, for example, uses a prompt similar to:

[INST] Write code to solve the following coding problem that obeys the constraints and passes the example test cases. Please wrap your code answer using ```:
{prompt}
[/INST]
ARC Lab, Tencent PCG org

Thanks for your comment! I follow the format of tulu in the instruction tuning. Here is the codebase I use for the instruction tuning https://github.com/allenai/open-instruct. We will release our code of instruction tuning later in the GitHub. The specific code for instruction data processing is here https://github.com/allenai/open-instruct/blob/9ebcb582cfc243a6dab75b4302fa432784db26c2/open_instruct/finetune.py#L273. Here is the format tulu used for your reference.

<|user|>
Your message here!
<|assistant|>

For best results, format all inputs in this manner. Make sure to include a newline after <|assistant|>, this can affect generation quality quite a bit.

Having so many prompt formats is making it difficult to create high quality tooling that can work with multiple models for both further training and inferencing. Unlike some of the other parameters and specifications on a model, like num_hidden_layers = 40 that can commonly be found in the supplied config.json, we are left reading the model card, README, or reading papers and code to figure out the appropriate format.

Make sure to include a newline after <|assistant|>, this can affect generation quality quite a bit.

It's frustrating to roll out support for this great-looking new model with the extra effort of adopting yet another uncommon format and maintaining that mapping (LLaMA-Pro -> Allen AI tulu format) in each of our tools to be sure our users have the best experience with it. If the prompt format is off by a little or the tooling makes heavy use of "system prompts", we will see degraded performance until we figure out a workaround for each different format there's demand for (and there has been no demand for this format until today).

How do you choose instruct formats?

Thank you @WuChengyue πŸ™

@sean-public
The formats will always be diverse. There is some work being done to include it in configuration files. See Templates for Chat Models .

An example is chat_template included in the tokenizer_config.json of Mistral-7B-Instruct-v0.1

Some libraries are beginning to support it as well: https://github.com/abetlen/llama-cpp-python/pull/790

Allen AI has one tokenizer_config.json that includes a chat_template.

You can see that in action by running the following python code:

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("allenai/tulu-2-dpo-70b", legacy=False)

chat = [
   {"role": "user", "content": "Hello, how are you?"},
   {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
   {"role": "user", "content": "I'd like to show off how chat templating works!"},
   {"role": "assistant", "content": "Great, please let me know if I can help."},
]

print(tokenizer.apply_chat_template(chat, tokenize=False))

Which outputs:

$ python3 main.py 
<|user|>
Hello, how are you?
<|assistant|>
I'm doing great. How can I help you today?</s>
<|user|>
I'd like to show off how chat templating works!
<|assistant|>
Great, please let me know if I can help.</s>

If we run the same code for LLaMa-Pro-8B-Instruct, we can see:

$ python3 main.py 
No chat template is defined for this tokenizer - using the default template for the LlamaTokenizerFast class. If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.

<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

Hello, how are you? [/INST] I'm doing great. How can I help you today? </s><s>[INST] I'd like to show off how chat templating works! [/INST] Great, please let me know if I can help. </s>

if we modify the tokenizer to use a chat_template, we can see the difference:

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("TencentARC/LLaMA-Pro-8B-Instruct", legacy=False)

+ tokenizer.chat_template = "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}"

chat = [
   {"role": "user", "content": "Hello, how are you?"},
   {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
   {"role": "user", "content": "I'd like to show off how chat templating works!"},
   {"role": "assistant", "content": "Great, please let me know if I can help."},
]

print(tokenizer.apply_chat_template(chat, tokenize=False))

Which outputs:

$ python3 main.py 
<|user|>
Hello, how are you?
<|assistant|>
I'm doing great. How can I help you today?</s>
<|user|>
I'd like to show off how chat templating works!
<|assistant|>
Great, please let me know if I can help.</s>

A change adding chat_template to the tokenizer_config.json has been merged. Thank you!

Closing this disussion.

gardner changed discussion status to closed

Sign up or log in to comment