Should not be called mixtral, the models made into the moe are yi based

#2
by teknium - opened

Mixtral is a whole other base model lol

I'm with teknium, this name could be misleading.

Yup, could simply be Yi-34Bx2-MoE, but it's ok

It does use the mixtral method though, so there is a half-truth to it

I agree, Mixtral is a specific model by Mistral.AI, and it is very confusing when you name all your models in this way.
Your models are Mixture of Experts models, "MoE", and the model Mixtral has nothing to do with them (other than Mixtral also using a MoE approach, which obviously was their reason for calling it Mixtral, punning on their name Mistral, and Mixture)
Very interesting models, though - but please change your naming scheme!!

It's easy to ask for renaming with weyaxi renamer tool:
https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-renamer

Just enter your repo name and HF token and it'll generate a pull request for leaderboard name change.

Owner

the reason why called mixtral is that the model is based on architecture of MixtralForCausalLM, if you take a look at the config file.

"architectures": [ "MixtralForCausalLM" ].

I haven’t thought of a new name yet.

I think you should call it cloud9 :D

the reason why called mixtral is that the model is based on architecture of MixtralForCausalLM, if you take a look at the config file.

"architectures": [ "MixtralForCausalLM" ].

I haven’t thought of a new name yet.

How about one of the following names:
Yi-Mixtral_34Bx2_MoE_60B
MixYi-34Bx2_MoE_60B
MiYi-34Bx2_MoE_60B
Yi-34Bx2_MoE_60B

The name "Mixtral" imply "Mistral" based mixture-of-experts.

Regardless of the name, we'd love to learn more about your process. The results looks extremely promising.

I like Yi-34Bx2_MoE_60B, short and represents everything this model has to offer

Sign up or log in to comment