Do not add EOS token when tokenizine by default

#4
by p1atdev - opened

This PR reduces the confusing about tokenizer loading.

The current setting requires loading the tokenizer with add_eos_token=False or the EOS token will be added automatically, leading to weird completion results.

  • Before:
tokenizer = AutoTokenizer.from_pretrained("sbintuitions/sarashina2-8x70b",  add_eos_token=False)
  • After:
tokenizer = AutoTokenizer.from_pretrained("sbintuitions/sarashina2-8x70b")

"add_eos_token": false in tokenizer_config.json is the same as sbintuitions/sarashina2-70b's.
https://huggingface.co/sbintuitions/sarashina2-70b/blob/main/tokenizer_config.json#L134

SB Intuitions org

Thank you. LGTM!

kajyuuen changed pull request status to merged

Sign up or log in to comment