Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

#72
by Kenkentron - opened

Thanks for the model!

I encounter the following when loading the tokenizer:

from transformers import AutoTokenizer

checkpoint_path = "microsoft/Phi-3-mini-4k-instruct"
tokenizer = AutoTokenizer.from_pretrained(checkpoint_path)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

Does this mean I would have to free the embedding layers when doing fine-tuning with LoRA?

Thanks!

have you found the solution ? if yes please share

@ilyassacha
Sorry for the slow reply, I believe it was fixed by this commit (https://github.com/huggingface/transformers/commit/38da0faa9ff6b800debf59386840d41f199bfd74) and upgrading to transformers-4.44.0 gets rid of the warning.

I could be wrong but I think this case was simply because phi3 added extra tokens on top of llama2's tokenizer when training. Like if you add new tokens yourself and save, you get the same behavior (before that commit).

Sign up or log in to comment