Transformers documentation
Writing a chat template
Writing a chat template
A chat template is a Jinja template stored in the tokenizer’s chat_template attribute. Jinja is a templating language that allows you to write Python-like code and syntax.
{%- for message in messages %}
{{- '<|' + message['role'] + |>\n' }}
{{- message['content'] + eos_token }}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|assistant|>\n' }}
{%- endif %}
If you stare at this for a while, you should realize that this is actually very like Python, albeit with some strange
{%-
syntax. The template iterates over a list of messages, and for each message, it prints the role and content of
the message, followed by an end-of-sequence token. If add_generation_prompt=True
, it adds
the starting header for an assistant message to the end of the conversation.
Load the written template as a string and assign it to the tokenizer’s chat_template
attribute. Once set, the template is used whenever you call apply_chat_template(). It is also saved
with the tokenizer whenever save_pretrained() or push_to_hub() is called. The template is saved in the chat_template.jinja
file in the tokenizer directory. You can
edit this file directly to change the template, which is often easier than manipulating a template string.
Template writing tips
The easiest way to start writing Jinja templates is to refer to existing templates. Use print(tokenizer.chat_template)
on any chat model to see the template it’s using. Try starting with simple models that don’t call any tools or support RAG because tool-use models can have very complex templates. Finally, take a look at the Jinja documentation for more details about formatting and syntax.
There are some specific tips and pitfalls you may encounter while writing chat templates specifically, though, and this section will cover some of them in more detail.
Writing multimodal chat templates
For multimodal templates, the chat_template
attribute is set on the processor, not the tokenizer. The content
key of a message is often a list of content dicts,
rather than just a single string. You may wish to check the type of each content item in the list, and handle it accordingly.
Generally, the template should not directly access image or video data. This is normally handled by the processor after template rendering has finished. Instead,
your template should emit a single special token like <|image|>
or <|video|>
when it encounters image or video content. The processor will
expand the single special token out into a sequence of image or video tokens later. The exact tokens to emit depends on the model you’re working with. We strongly recommend loading an existing multimodal processor to see how it handles data.
The example template below handles mixed image and text content.
{%- for message in messages %}
{%- if loop.index0 == 0 %}
{{- bos_token }}
{%- endif %}
{{- '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n' }}
{%- if message['content'] is string %}
{{- message['content'] }}
{%- else %}
{%- for content in message['content'] %}
{%- if content['type'] == 'image' %}
{{- '<|image|>' }}
{%- elif content['type'] == 'text' %}
{{- content['text'] }}
{%- endif %}
{%- endfor %}
{%- endif %}
{{- '<|eot_id|>' }}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|start_header_id|>assistant<|end_header_id|>\n\n' }}
{%- endif %}
This multimodal template is very similar to the more simple template above, but it checks for content
lists,
and iterates over them to render <|image|>
tokens where necessary. This allows images to be inserted “into the flow”
of user text.
Not all models work this way - some may move all images to the end of the user message, for example. The chat template should always match the format the model was trained with.
Trimming whitespace
Jinja prints any whitespace before or after a block of text. This can be an issue for chat templates because adding extra whitespace that was not present during model training can harm performance. To remove the whitespace, add -
to the Jinja line syntax. This allows you to write your template with Pythonic indentation and linebreaks, without accidentally printing an indentation in the rendered output.
The example template below doesn’t use -
, resulting in extra whitespace being printed in the output.
{% for message in messages %}
{{ message['role'] + message['content'] }}
{% endfor %}
We strongly recommend using -
to ensure only the intended content is printed.
{%- for message in messages %}
{{- message['role'] + message['content'] }}
{%- endfor %}
Special variables and callables
The only constants in a template are the messages
variable and the add_generation_prompt
boolean. However, you have
access to any other keyword arguments that are passed to the apply_chat_template() method.
This provides flexibility and enables support for use-cases we may not have thought of while designing the spec. The most common additional variable is tools
, which contains a list of tools in JSON schema format. Although you can use any variable name you like, we highly recommend sticking to convention and using tools
for this purpose. This makes templates more compatible with the standard API.
You also have access to any tokens contained in tokenizer.special_tokens_map
, which often includes special tokens like bos_token
and eos_token
. Access these directly by name, like {{- bos_token }}
.
There are two callable functions available to you. To call them, use {{- function_name(argument) }}
.
raise_exception(msg)
raises aTemplateException
. This is useful for debugging or warning users about incorrect template usage.strftime_now(format_str)
retrieves the current date and time in a specific format, which is often required in system messages. It is equivalent to datetime.now().strftime(format_str) in Python.
Compatibility with non-Python Jinja
Jinja is implemented in multiple languages and they generally have the same syntax. Writing a template in Python allows you to use Python methods such as lower on strings or items on dicts. But this won’t work if the template is used in a non-Python implementation, for example, when deploying with Javascript or Rust.
Make the changes below to ensure compatibility across all Jinja implementations.
- Replace Python methods with Jinja filters. For example, replace
string.lower()
withstring|lower
ordict.items()
withdict|dictitems
. Most of the changes follow the same pattern exceptstring.strip()
, which is replaced withstring|trim
. Refer to the list of built-in filters for a complete list of filters. - Replace
True
,False
, andNone
(these are Python specific) withtrue
,false
, andnone
respectively. - Directly rendering a dict or list may return different results in other implementations. For example, string entries may change from single-quote to double-quote. To avoid this, add the tojson filter to maintain consistency.
Big templates
Newer models or models with features like tool-calling and RAG require larger templates that can be longer than 100 lines. It may be easier to write larger templates in a separate file. The line numbers in the separate file corresponds exactly to the line numbers in template parsing or execution errors, making it easier to debug any potential issues.
Write the template in a separate file and extract it to the chat template.
open("template.jinja", "w").write(tokenizer.chat_template)
You could also load an edited template back into the tokenizer.
tokenizer.chat_template = open("template.jinja").read()
Templates for tools
There isn’t a specific format for writing templates for tools but it is best to follow the standard API. This ensures the template is widely accessible across models without requiring users to write custom code to use tools with your model.
Formatting such as whitespace and special tokens are model-specific. Make sure everything exactly matches the format a model was trained with.
The following section lists elements of the standard API for writing templates for tools.
Tool definitions
Tools are passed as Python functions or a JSON schema. When functions are passed, a JSON schema is automatically generated and passed to the template. When a template accesses the tools
variable, it is always a list of JSON schemas.
Even though a template always receive tools as a JSON schema, you may need to radically change this format when rendering them to match the format a model was trained with. For example, Command-R was trained with tools defined with Python function headers. The template internally converts JSON schema types and renders the input tools as Python headers.
The example below shows how a tool is defined in JSON schema format.
{
"type": "function",
"function": {
"name": "multiply",
"description": "A function that multiplies two numbers",
"parameters": {
"type": "object",
"properties": {
"a": {
"type": "number",
"description": "The first number to multiply"
},
"b": {
"type": "number",
"description": "The second number to multiply"
}
},
"required": ["a", "b"]
}
}
}
An example of handling tool definitions in a chat template is shown below. The specific tokens and layouts should be changed to match the ones the model was trained with.
{%- if tools %}
{%- for tool in tools %}
{{- '<tool>' + tool['function']['name'] + '\n' }}
{%- for argument in tool['function']['parameters']['properties'] %}
{{- argument + ': ' + tool['function']['parameters']['properties'][argument]['description'] + '\n' }}
{%- endfor %}
{{- '\n</tool>' }}
{%- endif %}
{%- endif %}
Tool calls
In addition to rendering the tool definitions, you also need to render tool calls and tool responses in the template.
Tool calls are generally passed in the tool_calls
key of an "assistant”
message. This is always a list even though most tool-calling models only support single tool calls, which means the list usually only contains a single element.
{
"role": "assistant",
"tool_calls": [
{
"type": "function",
"function": {
"name": "multiply",
"arguments": {
"a": 5,
"b": 6
}
}
}
]
}
A common pattern for handling tool calls is shown below. You can use this as a starting point, but make sure you template actually matches the format the model was trained with!
{%- if message['role'] == 'assistant' and 'tool_calls' in message %}
{%- for tool_call in message['tool_calls'] %}
{{- '<tool_call>' + tool_call['function']['name'] + '\n' + tool_call['function']['arguments']|tojson + '\n</tool_call>' }}
{%- endif %}
{%- endfor %}
{%- endif %}
Tool responses
Tool responses are message dicts with the tool
role. They are much simpler than tool calls, and usually only contain the role
, name
and content
keys.
{
"role": "tool",
"name": "multiply",
"content": "30"
}
Some templates may not even need the name
key, in which case, you can write your template to only read the content
key.
{%- if message['role'] == 'tool' %}
{{- "<tool_result>" + message['content'] + "</tool_result>" }}
{%- endif %}
Contribute
Once a template is ready, set it to the chat_template
attribute in the tokenizer and test it with apply_chat_template(). If it works as expected, then upload it to the Hub with push_to_hub().
Even if you’re not the model owner, it is still helpful to add a template for a model with an empty or incorrect chat template. Open a pull request on the model repository to add the template!
tokenizer.chat_template = template
tokenizer.push_to_hub("amazing_company/cool_model", commit_message="Add chat template", create_pr=True)