Which merge method was used ? Linear ?

#5
by aaagggddd - opened

Hi, none of those because they're not frankenMoE methods. I used the hidden technique you can find here: https://github.com/cg123/mergekit/blob/mixtral/moe.md

Hi, none of those because they're not frankenMoE methods. I used the hidden technique you can find here: https://github.com/cg123/mergekit/blob/mixtral/moe.md

1、Can you tell me the yml file you used to merge this model ?( like https://github.com/cg123/mergekit/blob/mixtral/examples/gradient-slerp.yml
2、Which paper can I learn about frankenMoE ?(I search frankenMoE , but got nothing.)
3、Is there some blog showing more detail that you merge ?
4、Why not use other merge method, like SLERP ? (I found your blog : https://mlabonne.github.io/blog/posts/2024-01-08_Merge_LLMs_with_mergekit.html , this use SLERP as example.)

Thank you !

Hi @aaagggddd , the merge config yml file can be found in the "File and Versions" tab of this model.

Different merge methods have different objectives. In this case, I guess Maxime wanted to select different top performing models on different tasks (chat, code, math and RP), and make a MoE model for wrapping all these four models into just one. SLERP wouldn't be useful as it can take only two models at a time, and other techniques such as Passthrough (Frankenmerge) could work but it seems less intuitive and more manual work than rather building a MoE.

Sign up or log in to comment