Spaces:

yhavinga
/

dutch-tokenizer-arena

Running

update

751936e about 1 year ago

487 Bytes


	来源：
	- https://github.com/THUDM/GLM/tree/main/chinese_sentencepiece
	- https://huggingface.co/THUDM/glm-10b-chinese/


	## HF

	```
	tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-10b", trust_remote_code=True)
	```

	## 分词器

	tokenizer_config.json
	```
	"AutoTokenizer": [
	"tokenization_glm.GLMChineseTokenizer",
	null
	]
	```

	其中 GLMChineseTokenizer
	```
	https://huggingface.co/THUDM/glm-10b-chinese/blob/main/tokenization_glm.py
	```

	## 词典

	来自