Fix add_generation_prompt in tokenizer_config.json

#24

We should only add generation prompt after the last message.

Code

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('mosaicml/mpt-30b-chat', revision='refs/pr/23')

chat = [{
    'content':
        'Please summarize the goals in this text:\n\nGoing outside has benefits include reducing stress and triggering the relaxation response, which can help us not only feel better mentally, but even heal faster from physical ailments.',
    'role':
        'user'
}, {
    'content': 'You should go outside and touch grass.',
    'role': 'assistant'
}, {
    'content': 'What else can I do?',
    'role': 'user'
}
]

print('\nBEFORE!')
print(tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True))


tokenizer = AutoTokenizer.from_pretrained('mosaicml/mpt-30b-chat', revision='refs/pr/24')

print('\nAFTER!')
print(tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True))

Output

BEFORE!
<|im_start|>system
A conversation between a user and an LLM-based AI assistant. The assistant gives helpful and honest answers.
<|im_start|>user
Please summarize the goals in this text:

Going outside has benefits include reducing stress and triggering the relaxation response, which can help us not only feel better mentally, but even heal faster from physical ailments.<|im_end|>
<|im_start|>assistant

<|im_start|>assistant
You should go outside and touch grass.<|im_end|>
<|im_start|>assistant

<|im_start|>user
What else can I do?<|im_end|>
<|im_start|>assistant

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

AFTER!
<|im_start|>system
A conversation between a user and an LLM-based AI assistant. The assistant gives helpful and honest answers.
<|im_start|>user
Please summarize the goals in this text:

Going outside has benefits include reducing stress and triggering the relaxation response, which can help us not only feel better mentally, but even heal faster from physical ailments.<|im_end|>
<|im_start|>assistant
You should go outside and touch grass.<|im_end|>
<|im_start|>user
What else can I do?<|im_end|>
<|im_start|>assistant
sam-mosaic changed pull request status to merged
irenedea changed pull request title from Update tokenizer_config.json to Fix add_generation_prompt in tokenizer_config.json

Sign up or log in to comment