From your work, I find a new way to do model ensemble

#14
by xxx1 - opened

num_local_experts== num_experts_per_tok means all experts is in use. But diff rate of diff model

Owner

so any new model we can use?

Sign up or log in to comment