What tokenizer should I use?

by Tianming - opened Nov 23, 2022

Discussion

Tianming

Nov 23, 2022

Thanks for your great work.
I would like to know which tokenizer is fit for this model?

ValkyriaLenneth

Owner Nov 23, 2022

Thanks for your reply.
The tokenizer we used is Jieba for the preprocessing of the data. For the vocabulary, Roberta_zh's vocabulary is used for this model.

Tianming

Nov 26, 2022

Using 'Bert-base-chinese' related tokenizer is available? https://huggingface.co/bert-base-chinese/tree/main

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment