open-llm-leaderboard/open_llm_leaderboard · Future feature: system prompt and chat support

Open LLM Leaderboard org Dec 13, 2023

•

edited Dec 13, 2023

Hi!
Just wanted to keep the community posted, since this has been a heavily required feature: we wll add system prompts and chat prompts support (using the default prompts stored in the model's tokenizers) first quarter of next year!

Weyaxi

Dec 13, 2023

Thanks, I think with this feature, the model's scores will be more deterministic, and these scores will be more truthful.

clefourrier pinned discussion Dec 13, 2023

andysalerno

Dec 14, 2023

Awesome! This will be so great for evaluating models as chat agents!

dctanner

Jan 11

Awesome @clefourrier

Btw how will the default system prompt for a model be configured? I can’t find any mention of default system prompts in the HF Chat Templates docs.

Ideally I’d like to be able to include a default system prompt within the chat template. This seems important for models especially that are trained with a specific system prompt, that also needs to be used in inference.

clefourrier

Open LLM Leaderboard org Jan 12

•

edited Jan 12

Hi @dctanner ,
Thanks for your interest!
We'll use the input stored in the tokenizer configuration (see here)

dctanner

Jan 12

Thanks @clefourrier
Is there any way for us to set a default system message ‘hardcoded’ in the chat template, if there’s no system role msg included in the msgs array?

dctanner

Jan 12

Note fastchat handles model specific default system prompts by having them hardcoded for each model in their codebase! It’d be much better if we can define it in the chat template instead.

clefourrier

Open LLM Leaderboard org Jan 12

There should be, as the llama tokenizer has the added option to disable the default system prompt here.
Tagging @Rocketknight1 who will likely be able to give a more detailed answer on how to add it to a Tokenizer for default use :)

dctanner

Jan 12

Thanks for pointing that out.
It looks like use_default_system_prompt is a llama specific config. But what we're trying to work out how to define a default system prompt message in the chat template itself.
For example if you take a look at https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py they have hardcoded the correct default system_message strings. But it would be great if this can be done in the chat templates.
(ps @Rocketknight1 I've sent you an X DM if you wanna chat more there)

clefourrier

Open LLM Leaderboard org Jan 12

Side note: It would better to have the convo here about the chat template so that everybody can benefit from it :)

dctanner

Jan 12

(of course! I was just worried about littering the chat, but happy to have it here :) )

Rocketknight1

Jan 12

Hey! I should explain the default_system_prompt thing - that's not really how you're "supposed" to do things with chat templates, but when I created chat templates, we had to preserve backward compatibility for existing models, and LLaMA used config settings like that to control its system prompt. As a result, the chat template for LLaMA had to include logic to read that attribute.

If you're writing a chat template for a model in future, though, and you want to include a default system prompt, you can just write the prompt text into the template logic.

Rocketknight1

Jan 12

Also, regarding @clefourrier 's question, we have a full guide for adding chat templates here: https://huggingface.co/docs/transformers/main/chat_templating

If you want the tl;dr version, though, you just write a Jinja template string, or copy/modify an existing one, and set it as the tokenizer.chat_template attribute. Once you do that, just tokenizer.push_to_hub() and it's the official model chat template!

dctanner

Jan 12

•

edited Jan 12

Thanks @Rocketknight1 , do you think you could give an example? @mhenrichsen and I can't figure it out with jinja2. Seems like it might require the "jinja2.ext.do" extension.

{% set system_message_exists = false %}
{% for message in messages %}
    {% if message['role'] == 'system' %}
        {% set system_message_exists = true %}
    {% endif %}
{% endfor %}

{% if not system_message_exists %}
    {% set default_system_message = {'role': 'system', 'content': 'HELLO FROM SYSTEM'} %}
    {% do messages.append(default_system_message) %}
{% endif %}

{% for message in messages %}
    {{ message['role'] + '\n' + message['content'] + '\n' }}
{% endfor %}

Rocketknight1

Jan 12

Hi @dctanner , take a look at the LLaMA 2 template - it's got the same idea as your code, but we avoid do by creating a new variable and slicing the input messages instead: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/c1b0db933684edbfe29a06fa47eb19cc48025e93/tokenizer_config.json#L12

dctanner

Jan 12

@Rocketknight1 sorry can you paste it here, I don't have access.

Rocketknight1

Jan 12

•

edited Jan 12

@dctanner sorry, I forgot that was a gated model, and I also forgot that LLaMA dropped their old default message! I've made an edit of the template with a default system prompt added, plus newlines/indentation to make it easier to follow, but remember to remove them at the end before you set the attribute.

{% if messages[0]['role'] == 'system' %}
    {% set loop_messages = messages[1:] %}
    {% set system_message = messages[0]['content'] %}
{% else %}
    {% set loop_messages = messages %}
    {% set system_message = "INSERT DEFAULT SYSTEM MESSAGE HERE" %}
{% endif %}
{% for message in loop_messages %}
    {% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}
        {{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
    {% endif %}
    {% if loop.index0 == 0 and system_message != false %}
        {% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}
    {% else %}
        {% set content = message['content'] %}
    {% endif %}
    {% if message['role'] == 'user' %}
        {{ bos_token + '[INST] ' + content.strip() + ' [/INST]' }}
    {% elif message['role'] == 'assistant' %}
        {{ ' '  + content.strip() + ' ' + eos_token }}
    {% endif %}
{% endfor %}"

This is a very complex template because LLaMA inserts the system message into the text of the first user message. It can probably be simplified a lot if you have a more 'normal' template like ChatML where the system message is handled like other messages.

CombinHorizon

Mar 17

Idea: Would it work well, if it could be setup so that several prompts formats may be used:

the original (what is currently used)
the format specified by the model
some common prompt formats, such as alpaca, vicuna, chatml, bagel-chatml, stable-beluga, mistral

why would we might want this?

test the model for prompt sensitivity / brittleness

(on a lesser note: similar to how we might want models that generalize well, not just models with inflated test scores due to training on test data)

also to help understand for how well models work with certain prompts, and how its ability interacts with merges

UI: perhaps similar to how you can select different quantization methods for evaluating the models with

clefourrier

Open LLM Leaderboard org Mar 18

Hi!
Once it's added to the Open LLM Leaderboard, we'll just have a toggle selecting "use chat template" or not at submission, and using the default tokenizer's chat template in the model's repository, for 2 reasons:

Evaluations should be intuitively reproducible - storing the model commit should be enough to have access to all information related to the model's run.
Preventing abuse in submissions, with one model submitted 50 times with minuscule variations in its prompt.

avilaroman

Apr 19

Hi guys, based on @Rocketknight1 cool thing! I added this part: Now, it will alternate conversation roles between user and assistant as expected.
{% if messages[0]['role'] == 'system' %}
{% set loop_messages = messages[1:] %}
{% set system_message = messages[0]['content'] %}
{% else %}
{% set loop_messages = messages %}
{% set system_message = "INSERT DEFAULT SYSTEM MESSAGE HERE" %}
{% endif %}

{% for message in loop_messages %}
{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}
{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}
{% endif %}

{% if loop.index0 == 0 and system_message != false %}
    {% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}
{% else %}
    {% set content = message['content'] %}
{% endif %}

{% if message['role'] == 'user' %}
    {{ bos_token + '[INST] ' + content.strip() + ' [/INST]' }}
{% elif message['role'] == 'assistant' %}
    {{ ' '  + content.strip() + ' ' + eos_token }}
{% endif %}

{% endfor %}

CombinHorizon

Apr 20

•

edited Apr 20

yeah, I think we don't want spam (too many prompt variant) submissions,
my concern is about potentially non-optimally configured models , esp merges where you don't know what prompt format is best

what do you think about - a 3rd auto-detect prompt option, where it tries several prompt formats, and sees how well it responds to each before committing to one for the test?

(see https://old.reddit.com/r/LocalLLaMA/comments/18ljvxb/llm_prompt_format_comparisontest_mixtral_8x7b/ - prompt formats - example where the prompt template used impacts the type/quality of answers)

clefourrier

Open LLM Leaderboard org Apr 20

Hi!
As mentioned, we'll use only the variant selected in the model configuration - running more evaluations for all kinds of prompt variations will be expensive fast.

clefourrier changed discussion status to closed Jun 25