About Moe vocab extended model with non vocab extended model

#3
by ancv - opened

Hi @mlabonne ,
Thank for your great model. Btw, I have a specific question regarding the Moe model. I have a vocab extended Mistral 7B to be better for Vietnamese language, I want to Moe it with chat, code and math based on Mistral 7B to enhance model's capabilities. Is it possible? If the model has some differences in token ids between extended and non extended, then after merging I fintune with an amount of data (about 1B tokens), will the model be better?

Hi @ancv , thanks! Yes, this should be possible. Fine-tuning should definitely help too, although it might be more cost-efficient to fine-tune your experts instead.

Sign up or log in to comment