Question about tokenizer.model file

#2
by TungLam - opened

Thank you so much for releasing the wonderful model.

I have questions about tokenizer.model. I see the file size of tokenizer.model file is larger than the one of Mixtral-8x7B-v0.1. Did you custom the tokenizer?
Btw, is it possible to read what is exactly in the tokenizer.model file. I want to read its content (code or something else).

Thank you so much

OpenBuddy org

Yes, we have added CJK tokens into the original file.

You may want to examine it with sentencepiece library.

Thank you so much for your answer.

TungLam changed discussion status to closed

Sign up or log in to comment