mlabonne/Beyonder-4x7B-v2 · Which merge method was used ? Linear ?

Jan 16, 2024

https://github.com/cg123/mergekit/tree/mixtral?tab=readme-ov-file#merge-methods
there's many method here.

Owner Jan 16, 2024

Hi, none of those because they're not frankenMoE methods. I used the hidden technique you can find here: https://github.com/cg123/mergekit/blob/mixtral/moe.md

aaagggddd

Jan 16, 2024

Hi, none of those because they're not frankenMoE methods. I used the hidden technique you can find here: https://github.com/cg123/mergekit/blob/mixtral/moe.md

1、Can you tell me the yml file you used to merge this model ?（ like https://github.com/cg123/mergekit/blob/mixtral/examples/gradient-slerp.yml ）
2、Which paper can I learn about frankenMoE ？（I search frankenMoE , but got nothing.）
3、Is there some blog showing more detail that you merge ?
4、Why not use other merge method, like SLERP ? (I found your blog : https://mlabonne.github.io/blog/posts/2024-01-08_Merge_LLMs_with_mergekit.html , this use SLERP as example.)

Thank you !

nachoaristimuno

Mar 31, 2024

Hi @aaagggddd , the merge config yml file can be found in the "File and Versions" tab of this model.

Different merge methods have different objectives. In this case, I guess Maxime wanted to select different top performing models on different tasks (chat, code, math and RP), and make a MoE model for wrapping all these four models into just one. SLERP wouldn't be useful as it can take only two models at a time, and other techniques such as Passthrough (Frankenmerge) could work but it seems less intuitive and more manual work than rather building a MoE.