Tokenizer Japanese

#4
by GuenKainto - opened

Hello, can I ask how to create a tokenizer file for Japanese? I see that Japanese people use some Kanji in sentences or words. I found a simple tokenizer file containing hiragana and katakana words, I think I can use it but it lacks in compound words and Kanji words.
File link: https://git.ecker.tech/mrq/ai-voice-cloning/src/branch/master/models/tokenizers/japanese.json
Thankyou

Sign up or log in to comment