Idea for the future - combine this with traditional merging

#2
by Henk717 - opened

Possibly concept to further improve this, have each expert contain a mild merge of the other expects to ensure overlap.
This would make it a bit closer to what MDEL / Aurora is doing, since they plan to have one base model that is a traditional merge with MoE's on top for enhancement. Since I believe this architecture is a little different we might be able to create a similar effect by merging each expert at a low percentage and then using the resulting models for the MoE model.

That's an interesting thought - I'll definitely have to try it. Thanks!

Sign up or log in to comment