Update tokenizer_config.json

#12
by freaksamael - opened
No description provided.

For English is not need for Chinese chrs; also, if possible, it will be great if the size is increased; in general we need to tokenize some relevant text in 1024 window. Thank you!

Beijing Academy of Artificial Intelligence org

Hi, thanks for your interest.
The max length is 512 during the training, so it cannot process the sequence whose length is larger than 512. Actually, it only uses the first 512 tokens and ignores other tokens.
Therefore, increasing the size of max length has no impact, and the model still only uses the first 512 tokens.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment