How did you fill the MoE gates?

by ttkciar - opened

The big question here is if the hack I used to populate the MoE gates works well enough to take advantage of all of the experts.

Thank you for providing this model! As a PoC it seems remarkably successful.

Would you be willing, please, to describe the hack you used to populate the gates?

Thanks for the interest! I'm also surprised and delighted that it worked so well.

I wrote a blog post on how I made this model. I've also made the actual script available on a branch of mergekit - here is some quick documentation on how to use it.

Thank you :-)

Sign up or log in to comment