Text Generation
Transformers
PyTorch
mpt
Composer
MosaicML
llm-foundry
custom_code
text-generation-inference

Add chat_template to tokenizer_config.json

#39
Mosaic ML, Inc. org
edited Jan 10

Manually tested with

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('mpt-7b-chat-tokenizer')

chat = [
    {"role": "system", "content": "This is a prompt!"},
   {"role": "user", "content": "Hello, how are you?"},
   {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
   {"role": "user", "content": "I'd like to show off how chat templating works!"},
]

print(tokenizer.apply_chat_template(chat, tokenize=False))

where mpt-7b-chat tokenizer is a local folder that includes the modified tokenizer_config.json

irenedea changed pull request status to closed
Mosaic ML, Inc. org

@irenedea I tried this out locally and got an error

In [3]: from transformers import AutoTokenizer
   ...: tokenizer = AutoTokenizer.from_pretrained('mosaicml/mpt-7b-chat', revision='refs/pr/39')
Mosaic ML, Inc. org

ah yeah this PR is closed, the json is missing a comment. Can you try the manual test described in https://huggingface.co/mosaicml/mpt-7b-chat/discussions/40

Sign up or log in to comment