tokenizer configuration
#2
by
slvnwhrl
- opened
Hi,
thank you for providing this great open-source resource! :) I was wondering if you could add a tokenizer_config.json file? I was trying to run the model with transformers, but ran into problems... I realized that the tokenizer does not take care of longer sequences (i.e. >512) causing errors.
Thank you for this hint. I don't know until when I will find time to do that, as I need to digg into that. We computed that model back in 2020 and I never heard of that file. Could you maybe help me out with some ressources/information about it?