Text Generation
Transformers
Safetensors
llama
text-generation-inference
Inference Endpoints

Chat template with system prompt

#1
by tattrongvu - opened

Hi, thanks for your great models.
For some one who want to use it with OpenAI compatible API (using vLLM) you may need to modify & provide the chat template in the vllm serve command, otherwise you need to provide it in the chat_template args IN THE REQUEST directly.
The default chat template doesn't support system prompt.
Here is the modified one that allow user to concatenate the default German system prompt with the user provided system prompt.

{%- set default_system_prompt = 'Ein Gespräch zwischen einem Menschen und einem Assistenten mit künstlicher Intelligenz. Der Assistent gibt hilfreiche und höfliche Antworten auf die Fragen des Menschen.' %}
{%- for message in messages %}
{%- if loop.first and message['role']|lower != 'system' %}
{{- 'System: ' + default_system_prompt + '\\n' }}
{%- endif %}
{%- if message['role']|lower == 'user' %}
{{- message['role']|capitalize + ': ' + message['content'] + '\\n' }}
{%- elif message['role']|lower == 'assistant' %}
{{- message['role']|capitalize + ': ' + message['content'] + eos_token + '\\n' }}
{%- elif message['role']|lower == 'system' %}
{{- 'System: ' + default_system_prompt + ' ' + message['content'] + '\\n' }}
{%- endif %}
{%- endfor %}
{%-if add_generation_prompt %}
{{- 'Assistant: '}}
{%- endif %}

Note that by modified that chat template, the behavior of the model might be changed as well. Use with caution!
Verify with this code:

messages = [{"role": "system", "content": "You are a helpful assistant named Teuken."},{"role": "User", "content": "Wer bist du?"},{"role": "Assistant", "content": "Ich bin Teuken."},{"role": "User", "content": "Wann bist du geboren?"}]
tokenized_prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(tokenized_prompt)

It will output:

System: Ein Gespräch zwischen einem Menschen und einem Assistenten mit künstlicher Intelligenz. Der Assistent gibt hilfreiche und höfliche Antworten auf die Fragen des Menschen. You are a helpful assistant named Teuken.
User: Wer bist du?
Assistant: Ich bin Teuken.</s>
User: Wann bist du geboren?
Assistant: 

Hi @tattrongvu , thanks for your investigations. We now added a default system prompt, which is used, if no chat_template language is provided: https://huggingface.co/openGPT-X/Teuken-7B-instruct-commercial-v0.4/commit/2fcfcda2db6e3529f3c7676fd5a41675deaecd37

We also added some sample for the usage with vLLM to the Readme: https://huggingface.co/openGPT-X/Teuken-7B-instruct-commercial-v0.4#usage-with-vllm-server

Please revisit this again. As of now, you cannot simply use a custom system prompt when using the tokenizers apply_chat_template function.

messages = [{"role": "System", "content": "Answer with 'Pong'"}, {"role": "User", "content": "Ping"}]
chat = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Error: TemplateError: Roles must alternate User/Assistant/User/Assistant/...

I would suggest this as the default chat template:

{%- for message in messages %}
    {%- set role = message['role']|lower %}
    {%- if role == 'system' %}
        {{- 'System: ' + message['content'] + '\n' }}
    {%- elif role in ['user', 'assistant'] %}
        {{- message['role']|capitalize + ': ' + message['content'] + ('\n' if role == 'user' else eos_token + '\n') }}
    {%- else %}
        {{- raise_exception('Only user, assistant and system roles are supported!') }}
    {%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- 'Assistant: ' }}
{%- endif %}

There is also a related issue when hosting the model with Huggingface TGI.

docker run \
--rm \
--gpus all \
--shm-size 1g \
-p 8080:80 \
ghcr.io/huggingface/text-generation-inference:2.4.1 \
--model-id openGPT-X/Teuken-7B-instruct-commercial-v0.4 \
--trust-remote-code \
--num-shard 2 \
--max-batch-total-tokens 9502 \
--max-batch-prefill-tokens 9502 \
--max-input-tokens 9500 \
--max-total-tokens 9501 \
--max-client-batch-size 1 \
--max-batch-size 1
from huggingface_hub import InferenceClient
client = InferenceClient(
    base_url="http://HOST:PORT",
)
chat_completion = client.chat.completions.create(
    model="tgi",
    messages=[{"role": "user", "content": "What is deep learning?"}]
)
# Error: Template error: template not found

However this works perfectly.

client.text_generation('System: You are an AI\nUser: What is deep learning?\nAssistant: ', max_new_tokens=1000)
# Output: Deep learning is a subfield of machine learning that uses artificial...

Hi all, has this been solved in some way now? Can i also ask please - if i would like to use one of the above chat-templates, how would i actually do it? (i am deploying on runpod.io where i can only modify env-variables and how it is called)
My ultimate goal is to use the OpenAI library for running it. Which template would you suggest for this?

Thanks for your investigations. For this instruction-tuned model, we have used a selection of system messages listed here for training: https://huggingface.co/openGPT-X/Teuken-7B-instruct-commercial-v0.4/blob/main/gptx_tokenizer.py#L432

We suggest also using these system messages for the inference. In case you want to test custom system messages, you could specify a custom chat template with vLLM as described here https://huggingface.co/openGPT-X/Teuken-7B-instruct-commercial-v0.4#usage-with-vllm-server and set the --chat-template parameter to the Jinja-Template above or to the following Jinja-Template:

{%- if messages[0]["role"] == "system" %}
{{- messages[0]['role']|capitalize + ': ' + messages[0]['content'] + '\\n' }}
{%- set loop_messages = messages[1:] %}
{%- else %}
System: Ein Gespräch zwischen einem Menschen und einem Assistenten mit künstlicher Intelligenz. Der Assistent gibt hilfreiche und höfliche Antworten auf die Fragen des Menschen.{{- '\\n'}}
{%- set loop_messages = messages %}
{%- endif %}
{%- for message in loop_messages %}
{%- if message['role']|lower == 'user' %}
{{- message['role']|capitalize + ': ' + message['content'] + '\\n' }}
{%- elif message['role']|lower == 'assistant' %}
{{- message['role']|capitalize + ': ' + message['content'] + '</s>' + '\\n' }}
{%- else %}
{{- raise_exception('Only user and assistant roles are supported!') }}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- 'Assistant: '}}
{%- endif %}

For using custom system messages with the Huggingface Library, a custom chat template can be set with tokenizer.chat_template="{...Jinja...}"

Sign up or log in to comment