Inference code

#1
by jmjzz - opened

Hello, I’m wondering if this new version is finetuned so that we can do inference and evaluation on downstream tasks.

Both models require further fine-tuning for better performance when you do moe with mergekit (hidden or random). However, the model with hidden gates will do better without further fine-tuning and will require less data/iterations to reach better accuracy.

That’s what I understood from my own moe merges.

Sign up or log in to comment