Awesome model

#1
by Miltos - opened

I am an amateur so excuse me for asking. Is this model a Llama or a Mistral model basically?

Thank you for liking the model! As for your question, in a way,it's both. I created this model by finetuning the ReMM Mistral model made by Undi95 on one of my own datasets; and ReMM mistral is a model created by "merging" different Llama2 and Mistral models. So... it's a hybrid, sorta. Parts of it are Llama, parts of it are Mistral.

Hope that clears things up a bit. Honestly, it's surprising that two differently pretrained models can by merged and still be coherent, but Undi still manages to pull it off time and again.

Where can i read more about models metging and glueing ?
The model is very interesting,and now im thinking about how to do it better,if possible

@Alex5000505 You can use this tool to merge https://github.com/cg123/mergekit it's what everyone I know uses
The (as basic as you can possibly get) command I use is something like:

mergekit-legacy ./MistralMakise-13b-Merged/  --base-model ./MistralMakise-13b/ --cuda --merge ./ReMM-Mistral-13B/ --weight 0.3 --density 0.5

I think the non-legacy version of it uses yaml files instead of command line arguments.

I'm not an authority on merging; ReMM Mistral was made by Undi, one of the most experienced mergers in the community; he knows what he's doing with regards to merging, and I do not.
Sadly I don't think there are any good blogposts that walk through the subject. Hardly anything's documented in this community, you just need to ask informally on Discord servers.

Note that if you want to one-up this, the real contribution of this model is the dataset; merging back in just stabilized it a bit (a lot) and made it viable for actual use while still allowing it to capitalize on ReMM's writing ability. Proper professional merges can combine dozens of models using very fancy methods. Most of the top-ranking models on Weicon's leaderboard are merges IIRC.

Oh,thanks
About datasets - did you tried to finetune some RP models with psychology datasets ?
I suppose it could give better results,but didn't tried yet on my own.

Sign up or log in to comment