Is this a merge?

#3
by mrfakename - opened

Hi,
Is this a merge or pretrained model?

At its core, GemMoE comprises 8 separately fine-tuned Gemma models, with 2 experts per token

thanks! i assume this means that each expert was finetuned, then merged?

combine them using a hidden gate with a heavily modified version of mergekit, a tool developed by the brilliant Charles Goddard.

Ah, makes sense. Thanks for the clarification!!

mrfakename changed discussion status to closed

Sign up or log in to comment