tokenizer.json里面没有中文字符

by rootsule - opened Feb 5

Feb 5

大佬，中文分词全是[UNK]啊

neavo

Owner Feb 5

•

模型采用的是字节级别的词表，所以 tokenizer.json 里面不是明文的中文字符，但是确实是覆盖了中文字符的
可以提供一下你的测试代码

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment