Article
Merge Large Language Models with mergekit
By
โข
โข
91Great explanation ;)
I'm considering between the Slerp and Passthrough to create a smaller version of a big one and use that as speculative draft model.
With Passthrough, would it make sense to pick the layer evenly to avoid too far distance in between layer? (as mentioned in the Solar paper) e.g: original layer from 1 to 10, then pick 1,4,7,10 to create a 2 time smaller model.
Which method would you recommend?
Thanks in advance!