Why sentencepiece tokenizer
#3
by
mnwato
- opened
I need to have a mixed language (persian, some english words, numbers and characters). In medium you said that you used sentencepiece tokenizer for this model. Is there any reason for this descision? Why didn't you choose BPE?
Please see more details on the blog posts: https://khashei.medium.com/a-not-so-dangerous-ai-in-the-persian-language-39172a641c84
Also feel free to contact on telegram if you have more questions.
khashei
changed discussion status to
closed