来源: | |
- https://github.com/THUDM/GLM/tree/main/chinese_sentencepiece | |
- https://huggingface.co/THUDM/glm-10b-chinese/ | |
## HF | |
``` | |
tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-10b", trust_remote_code=True) | |
``` | |
## 分词器 | |
tokenizer_config.json | |
``` | |
"AutoTokenizer": [ | |
"tokenization_glm.GLMChineseTokenizer", | |
null | |
] | |
``` | |
其中 GLMChineseTokenizer | |
``` | |
https://huggingface.co/THUDM/glm-10b-chinese/blob/main/tokenization_glm.py | |
``` | |
## 词典 | |
来自 | |