About Merging

#5
by odysseusq - opened

Hi there, I'm new to the MoE. Is this MoE model generated by directly merging the FFN layers from Meta-Llama-3-8B-Instruct, Llama3-8B-OpenHermes-DPO, ...? If so, how is the gating layers designed to let different experts handle different positive prompts?
(Sry if this sounds silly cause I'm quite new to this area)

Sign up or log in to comment