Benefits of this model

#3
by Nexesenex - opened

Hey Perlthoughts.

First, congrats for this successful MOE setup. I made an exl2 6.0bpw quant and tested the perplexity at 512 tokens, and :

Screenshot 2023-12-30 at 13-08-43 Text generation web UI.png

Your model doesn't alter the perplexity negatively.

Then, beyond this achievement, does such a MOE bring benefit compared to the original model Mistral Instruct v0.2, if the weights of each experts are the same?

i believe it offers more perceptrons to be finetuned which in theory should produce better outputs. Thanks for trying it!

Nexesenex changed discussion title from Benedits of this model to Benefits of this model

Sign up or log in to comment