Merge method
#4
by
dnhkng
- opened
Looking at the mergekit config, I see:
###
merge_method: linear
parameters:
weight: 1.0
slices:
- sources:
- model: 152334H/miqu-1-70b-sf
layer_range: [0, 1] - model: lizpreciatior/lzlv_70b_fp16_hf
layer_range: [0, 1]
parameters:
weight: 0
- model: 152334H/miqu-1-70b-sf
- sources:
- model: 152334H/miqu-1-70b-sf
layer_range: [1, 20]
- model: 152334H/miqu-1-70b-sf
...
- sources:
- model: 152334H/miqu-1-70b-sf
layer_range: [79, 80] - model: lizpreciatior/lzlv_70b_fp16_hf
layer_range: [79, 80]
parameters:
weight: 0
- model: 152334H/miqu-1-70b-sf
So, I think you want to use linear interpolation between the models on the first and last layers. But then you set the weight to zero, which I think sets the linear model to just use the second model listed.
If I'm wrong here, could you explain whats happening?
It's a trick, it's not a linear merge, it's actually pass-through. Linear merging with a dummy second model with 0 weight is just used so the tokenizer-based merge routine is invoked for embed_tokens, which wouldn't happen with regular pass-through merging. It's a brilliant idea by Eric Hartford and friends, as explained in the merge config comments of TheProfessor-155b.