Adds the tokenizer configuration file

#80
by lysandre HF staff - opened
OpenAI community org
edited Feb 16

The tokenizer configuration file is missing and therefore leading to unforeseen errors after the migration of the canonical models.

Refer to the following issue for more information: transformers#29050

The current failing code is the following:

>>> previous_tokenizer = AutoTokenizer.from_pretrained("gpt2")
>>> current_tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
>>> print(previous_tokenizer.model_max_length, current_tokenizer.model_max_length)
1000000000000000019884624838656, 1024

This is the result after the fix:

>>> previous_tokenizer = AutoTokenizer.from_pretrained("gpt2")
>>> current_tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
>>> print(previous_tokenizer.model_max_length, current_tokenizer.model_max_length)
1024, 1024
lysandre changed pull request status to open
lysandre changed pull request status to merged

Sign up or log in to comment