Model performance and more questions

#3
by agershun - opened

Interesting fact: I tried to compare this model NurtureAI/Starling-LM-11B-alpha with the current leader of the board MetaMath-Cybertron-Starling on today, and 11B gives better and more relevant results on my queries. Probably MetaMath was overtrained to pass the tests, rather than to be more "useful". Thank you again.

May I ask you some more questions?:

  • Why did you use this strange merging configuration of layers?
  • Have you tried to merge other layer configurations?

same as you did i saw better generations. I made more 11bs on Nurtureai. My thoughts are that 11b will perform better once finetuned with dpo or sft with new layers.

Ray, may I ask you couple more questions:

  1. How long does it take to merge the layers with the mergekit? Does this process requires a GPU or it can be done with the CPU only?
  2. Have you already tried to use this "11B" method with "new champions" (on December 12, 2023) like v1olet/v1olet_marcoroni-go-bruins-merge-7B or with new "base model" mistralai/Mistral-7B-Instruct-v0.2?

Thank you

it doesn't take long at all, for a 7b to 11b just a couple of minutes. I just did the mistral v0.2 for you. I also included the merge script for mergekit on the model card.

And I tested it, and it works perfect (in my case)! )) Thank you!

Probably this "11B" approach look promising. It is interesting: will it work with Mixtral 8x7B?
Probably, it is necessary to be more careful with layers..

Sign up or log in to comment