Changing Reserve token names

#3
by vburb - opened

Question on the reserve tokens. Is there any impact to changing the value name of the tokens? For instance, if we wanted to use certain reserve tokens for classification purposes and didn't want to create vague token references in our prompts?

For example,

"128010": {
"content": "<|reserved_special_token_5|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},

to

"128010": {
"content": "<|new_label_5_reference|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},

Astronomer org

I think in theory this is ok. But just know that the embedding value of that token id was not pretrained on whatever you are working on, so it probably has either no semantic meaning or has incorrect semantic meaning from pretraining.

Sign up or log in to comment