What tokenizer should I use?

#1
by Tianming - opened

Thanks for your great work.
I would like to know which tokenizer is fit for this model?

Thanks for your reply.
The tokenizer we used is Jieba for the preprocessing of the data. For the vocabulary, Roberta_zh's vocabulary is used for this model.

Using 'Bert-base-chinese' related tokenizer is available? https://huggingface.co/bert-base-chinese/tree/main

Sign up or log in to comment