Model performance and more questions

by agershun - opened Dec 10, 2023

Dec 10, 2023

•

edited Dec 10, 2023

Interesting fact: I tried to compare this model NurtureAI/Starling-LM-11B-alpha with the current leader of the board MetaMath-Cybertron-Starling on today, and 11B gives better and more relevant results on my queries. Probably MetaMath was overtrained to pass the tests, rather than to be more "useful". Thank you again.

May I ask you some more questions?:

Why did you use this strange merging configuration of layers?
Have you tried to merge other layer configurations?

perlthoughts

Dec 10, 2023

•

edited Dec 10, 2023

same as you did i saw better generations. I made more 11bs on Nurtureai. My thoughts are that 11b will perform better once finetuned with dpo or sft with new layers.

agershun

Dec 12, 2023

Ray, may I ask you couple more questions:

How long does it take to merge the layers with the mergekit? Does this process requires a GPU or it can be done with the CPU only?
Have you already tried to use this "11B" method with "new champions" (on December 12, 2023) like v1olet/v1olet_marcoroni-go-bruins-merge-7B or with new "base model" mistralai/Mistral-7B-Instruct-v0.2?

Thank you

perlthoughts

Dec 12, 2023

•

edited Dec 12, 2023

it doesn't take long at all, for a 7b to 11b just a couple of minutes. I just did the mistral v0.2 for you. I also included the merge script for mergekit on the model card.

agershun

Dec 12, 2023

And I tested it, and it works perfect (in my case)! )) Thank you!

Probably this "11B" approach look promising. It is interesting: will it work with Mixtral 8x7B?
Probably, it is necessary to be more careful with layers..

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment