Mistral doesn't have a `pad_token_id`? πŸ€”

#66
by ingo-m - opened

According to the documentation, the pad_token_id is optional?

As confirmed by:

from transformers import AutoTokenizer

base_model_name = "mistralai/Mistral-7B-Instruct-v0.1"

tokenizer = AutoTokenizer.from_pretrained(base_model_name)

print(tokenizer.pad_token_id)
# None

I don't understand why, surely a padding token must have been used during training?

I encountered the same issue when I tried to fine-tune it, and wondered how it is supposed to set? Thanks!

There's also a question about it on reddit:
https://www.reddit.com/r/LocalLLaMA/comments/184g120/mistral_fine_tuning_eos_and_padding/

I'm wondering whether it even matter what's the padding token as long as it is masked out with the attention mask?

Sign up or log in to comment