Why sentencepiece tokenizer

by mnwato - opened Sep 10, 2023

Sep 10, 2023

I need to have a mixed language (persian, some english words, numbers and characters). In medium you said that you used sentencepiece tokenizer for this model. Is there any reason for this descision? Why didn't you choose BPE?

khashei

bolbolzaban org Sep 11, 2023

Please see more details on the blog posts: https://khashei.medium.com/a-not-so-dangerous-ai-in-the-persian-language-39172a641c84
Also feel free to contact on telegram if you have more questions.

khashei changed discussion status to closed Sep 11, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment