Benefits of this model
#3
by
Nexesenex
- opened
Hey Perlthoughts.
First, congrats for this successful MOE setup. I made an exl2 6.0bpw quant and tested the perplexity at 512 tokens, and :
Your model doesn't alter the perplexity negatively.
Then, beyond this achievement, does such a MOE bring benefit compared to the original model Mistral Instruct v0.2, if the weights of each experts are the same?
i believe it offers more perceptrons to be finetuned which in theory should produce better outputs. Thanks for trying it!
Nexesenex
changed discussion title from
Benedits of this model
to Benefits of this model