What tokenizer should I use?
#1
by
Tianming
- opened
Thanks for your great work.
I would like to know which tokenizer is fit for this model?
Thanks for your reply.
The tokenizer we used is Jieba for the preprocessing of the data. For the vocabulary, Roberta_zh's vocabulary is used for this model.
Using 'Bert-base-chinese' related tokenizer is available? https://huggingface.co/bert-base-chinese/tree/main