The meaning of some special token fields

#115
by tlain - opened

{
"id": 50883,
"content": "<|10.36|>",
"single_word": false,
"lstrip": false,
"rstrip": false,
"normalized": true,
"special": false
},
"50258": {
"content": "<|startoftranscript|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
"50259": {
"content": "<|en|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}

"lstrip", "normalized" , "rstrip" ,"single_word","special"
What are the meanings of these fields? Why are special in some tokens true and some false? Thanks

Sign up or log in to comment