60 languages?

#8
by conan1024hao - opened

In the model description, you said flan-t5 was trained on 60 languages (including Japanese, etc.). However, the vocal_size is only 32138, how could it handle 60 languages?

i think this is impossible under sentencepiece

Same issue , tokenizer doesn't understand Arabic.

Same issue , tokenizer doesn't understand Chinese

neither Vietnamese !

Hello everyone, thanks for the issue and sorry for the confusion
I think that google has opensourced the English versions only at the moment, we posted a ticket on their repository to track the issue: https://github.com/google-research/t5x/issues/1131

same problem with Korean, Tokenizer cant recognize Korean tokens

Sign up or log in to comment