Wauplin HF staff commited on
Commit
58feb13
1 Parent(s): 3391fef

Add `eos_token` to the tokenizer config.

Browse files

If we merge the ChatWidget right now, it would not work since the widget would not be able to format the chat_template correctly from the information available via API (https://huggingface.co/api/models/microsoft/DialoGPT-large?config=True). This is because to get `eos_token` one need to get `eos_token_id` from [config.json](https://huggingface.co/microsoft/DialoGPT-large/blob/main/config.json) and then reading [`vocab.json`](https://huggingface.co/microsoft/DialoGPT-large/blob/main/vocab.json) to check which token is associated with this id.

This PR fixes this by adding `eos_token` directly to `tokenizer_config.json` cc

@julien-c



@osanseviero



@sbrandeis

Files changed (1) hide show
  1. tokenizer_config.json +2 -1
tokenizer_config.json CHANGED
@@ -1,4 +1,5 @@
1
  {
2
  "model_max_length": 1024,
3
- "chat_template": "{% for message in messages %}{{ message.content }}{{ eos_token }}{% endfor %}"
 
4
  }
 
1
  {
2
  "model_max_length": 1024,
3
+ "chat_template": "{% for message in messages %}{{ message.content }}{{ eos_token }}{% endfor %}",
4
+ "eos_token": "<|endoftext|>"
5
  }