## token space ```yml # multi-space {"id": 881, "token": "\r\n\r\n", "token_decode": "\r\n\r\n", "token_len": 4, "zh_count": 0, "space_count": 4, "digit_count": 0, "zh_symbol_count": 0} # space + en {"id": 862, "token": "\treturn", "token_decode": "\treturn", "token_len": 7, "zh_count": 0, "space_count": 1, "digit_count": 0, "zh_symbol_count": 0} # sapce + zh {"id": 40195, "token": " 下", "token_decode": " 下", "token_len": 2, "zh_count": 1, "space_count": 1, "digit_count": 0, "zh_symbol_count": 0} ``` special_token ``` {"id": 100257, "token": "<|endoftext|>", "token_decode": "<|endoftext|>", "token_len": 13, "zh_count": 0, "space_count": 0, "digit_count": 0, "zh_symbol_count": 0} {"id": 100258, "token": "<|fim_prefix|>", "token_decode": "<|fim_prefix|>", "token_len": 14, "zh_count": 0, "space_count": 0, "digit_count": 0, "zh_symbol_count": 0} {"id": 100259, "token": "<|fim_middle|>", "token_decode": "<|fim_middle|>", "token_len": 14, "zh_count": 0, "space_count": 0, "digit_count": 0, "zh_symbol_count": 0} {"id": 100260, "token": "<|fim_suffix|>", "token_decode": "<|fim_suffix|>", "token_len": 14, "zh_count": 0, "space_count": 0, "digit_count": 0, "zh_symbol_count": 0} {"id": 100276, "token": "<|endofprompt|>", "token_decode": "<|endofprompt|>", "token_len": 15, "zh_count": 0, "space_count": 0, "digit_count": 0, "zh_symbol_count": 0} ``` 汉字+符号 ``` {"id": 39045, "token": ",请", "token_decode": ",请", "token_len": 2, "zh_count": 1, "space_count": 0, "digit_count": 0, "zh_symbol_count": 0} ``` ## 词典文件 ``` IQ== 0 Ig== 1 Iw== 2 JA== 3 JQ== 4 Jg== 5 Jw== 6 KA== 7 ``` 这是啥玩意?