tokenizer configuration

#2
by slvnwhrl - opened

Hi,

thank you for providing this great open-source resource! :) I was wondering if you could add a tokenizer_config.json file? I was trying to run the model with transformers, but ran into problems... I realized that the tokenizer does not take care of longer sequences (i.e. >512) causing errors.

Technical University of Munich org
edited Apr 26, 2023

Thank you for this hint. I don't know until when I will find time to do that, as I need to digg into that. We computed that model back in 2020 and I never heard of that file. Could you maybe help me out with some ressources/information about it?

Sign up or log in to comment