eson's picture
update
751936e
|
raw
history blame
No virus
487 Bytes

来源:

HF

tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-10b", trust_remote_code=True)

分词器

tokenizer_config.json

    "AutoTokenizer": [
      "tokenization_glm.GLMChineseTokenizer",
      null
      ]

其中 GLMChineseTokenizer

https://huggingface.co/THUDM/glm-10b-chinese/blob/main/tokenization_glm.py

词典

来自