No description provided.

Presently loading the tokenizer via:

tokeniser = T5Tokenizer.from_pretrained("Exscientia/IgT5", do_lower_case=False)

Yields the following error:

ValueError: Non-consecutive added token '<extra_id_99>' found. Should have index 128 but has index 28 in saved vocabulary.

This PR should resolve the issue.

justinbarton changed pull request status to open
Exscientia org

Hi @justinbarton , thank you for the interest in our work! What version of transformers are you using? I tried this line in a colab notebook with both the transformers version we developed in (4.35.2) as well as the latest version (4.39.3) and they both imported the tokeniser without any errors.

How odd. I was using 4.30.2.

exs-fdreyer changed pull request status to closed

Sign up or log in to comment