tokenizer.add_eos_token and maxlength

#15
by Gregorioz - opened

In the old version of demo code, tokenization processes were tokenizer.add_eos_token = True and tokenizer(...,maxlength=maxlength-1,...). Recent update removed tokenizer.add_eos_token = True, and modified value of argument maxlengthfrom maxlength-1 to maxlength. What are the differences of two tokenization methods? Is there any risk since divergent embeddings are observed?

Sign up or log in to comment