分词器tokenizer错误，无法识别到eos字符

#72

by zheng-nlper - opened Jul 30, 2023

Jul 30, 2023

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
print(tokenizer._tokenize("你好"))

输出：['▁你', '好', '</', 's', '>']

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment