model destroyed by deleting tokens?

#52
by NickyNicky - opened

url: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/config.json
"vocab_size": 32064
image.png

url: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/blob/main/tokenizer_config.json
last token -->> "32010": {
"content": "<|user|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}

image.png

have deleted tokens which were important for the model

If someone tries to add a new token, the new state would be very different and would the model be ruined?

I would like to know your answer.

Microsoft org

You can use the placeholder tokens or add any new tokens up to id=32063, since the supported vocabulary size is 32064. However, you will need to fine-tune the model so it learns how to use the new tokens.

nguyenbh changed discussion status to closed

Sign up or log in to comment