Non working model, how to fix.

#1
by DavidAU - opened

Hi;

Fellow mergerkit'er here...

If you want this model to work [gguf]; remove the second:

  • sources:
    • model: FimbMagic
      layer_range: [36, 48]

AND/OR do NOT duplicate the last 2-3 layers.

Hope that helps;
D

Hey!

Thanks for the tip. I'll give it a try and make a V2 of UnFimbulvetr-20B. I kinda given up hope once I saw how it faired. So I'll give myself another chance to redeem this merge.

pass merges are more "sensitive" on the beginning layers too .
Other tip - learned hard way - watch you do not miss any layers ... ;

Google "model layer theory" - I think there are links too at mergekit (github - bottom of the page) to papers discussing models, layers and merging too.

I am currently merging multiple models via pass though method - the papers really helped... a lot of trial/error though...

pass merges are more "sensitive" on the beginning layers too .
Other tip - learned hard way - watch you do not miss any layers ... ;

Google "model layer theory" - I think there are links too at mergekit (github - bottom of the page) to papers discussing models, layers and merging too.

I am currently merging multiple models via pass though method - the papers really helped... a lot of trial/error though...

Will read the papers in a while. Also I've released a "Fixed(?)" version over at KaraKaraWitch/UnFimbulvetr-20B-V2

Hopefully it performs better. Reading the solar paper a bit I realized I shouldn't duplicate the last 8 layers.

by changing the lora rank to a higher value you can target more tensors in the stack ... so a lora of 256/32 will push a large enough set of tensors to change the whole model response priority; so you will need your alignment dataset to train at this deep level ; then after merging ... create a light weight lora ! .... ie 2-16 (5mil) and check your tasks are still converging!

you should also deploy the dpo chosen and rejected strategy .... to replace the rejected prompts etc so if you know the phrases to remove create your counter data set with gpt builder ... then use the dataset to clean the ofensixve comentary... but also all you are doing is hopefull reduing the probablitys with negativeness so , you should deeply train 0.7 to clean thougroughly~

by changing the lora rank to a higher value you can target more tensors in the stack ... so a lora of 256/32 will push a large enough set of tensors to change the whole model response priority; so you will need your alignment dataset to train at this deep level ; then after merging ... create a light weight lora ! .... ie 2-16 (5mil) and check your tasks are still converging!

you should also deploy the dpo chosen and rejected strategy .... to replace the rejected prompts etc so if you know the phrases to remove create your counter data set with gpt builder ... then use the dataset to clean the ofensixve comentary... but also all you are doing is hopefull reduing the probablitys with negativeness so , you should deeply train 0.7 to clean thougroughly~

I don't plan on doing a LoRA. If someone wants to do it, feel free.

pass merges are more "sensitive" on the beginning layers too .
Other tip - learned hard way - watch you do not miss any layers ... ;

Google "model layer theory" - I think there are links too at mergekit (github - bottom of the page) to papers discussing models, layers and merging too.

I am currently merging multiple models via pass though method - the papers really helped... a lot of trial/error though...

Tested a GGUF quant of V2. Seems like it's actually usable. Will close the issue

KaraKaraWitch changed discussion status to closed

This discussion might be of interest - all about layers , importance, with data/tests:

https://huggingface.co/froggeric/WestLake-10.7B-v2/discussions/1

Sign up or log in to comment