Updates the tokenizer configuration file

#2
by lysandre HF staff - opened
Facebook AI community org

The tokenizer configuration file is missing/incorrect and therefore leading to unforeseen errors after the migration of the canonical models.

Refer to the following issue for more information: transformers#29050

The current failing code is the following:

from transformers import AutoTokenizer

>>> previous_tokenizer = AutoTokenizer.from_pretrained("xlm-mlm-enro-1024")
>>> current_tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-mlm-enro-1024")
>>> print(previous_tokenizer.model_max_length, current_tokenizer.model_max_length)
512, 512

This is the result after the fix:

from transformers import AutoTokenizer

>>> previous_tokenizer = AutoTokenizer.from_pretrained("xlm-mlm-enro-1024")
>>> current_tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-mlm-enro-1024")
>>> print(previous_tokenizer.model_max_length, current_tokenizer.model_max_length)
512, 512
Publish this branch
This branch is in draft mode, publish it to be able to merge.

Sign up or log in to comment