Benefits of this model

by Nexesenex - opened Dec 30, 2023

Discussion

Nexesenex

Dec 30, 2023

•

edited Dec 30, 2023

Hey Perlthoughts.

First, congrats for this successful MOE setup. I made an exl2 6.0bpw quant and tested the perplexity at 512 tokens, and :

Your model doesn't alter the perplexity negatively.

Then, beyond this achievement, does such a MOE bring benefit compared to the original model Mistral Instruct v0.2, if the weights of each experts are the same?

perlthoughts

Owner Dec 30, 2023

i believe it offers more perceptrons to be finetuned which in theory should produce better outputs. Thanks for trying it!

Nexesenex changed discussion title from Benedits of this model to Benefits of this model Jan 11

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment