Has the tokenizer of the base model(Mistral-7B-v0.1) been retrained?

#37
by LH0521 - opened

Hi,
I noticed that Mistral-7B-v0.1 was used as the base model. However, the original Mistral-7B-v0.1 uses BPE tokenization, while I found that NV-Embed-v1 uses a word-by-word mapping method.

Did you retrain the tokenizer? If so, was it because the latent layer needs to integrate the words better?

Thanks!

Sign up or log in to comment