Merge method

#4
by dnhkng - opened

Looking at the mergekit config, I see:
###
merge_method: linear
parameters:
weight: 1.0
slices:

  • sources:
    • model: 152334H/miqu-1-70b-sf
      layer_range: [0, 1]
    • model: lizpreciatior/lzlv_70b_fp16_hf
      layer_range: [0, 1]
      parameters:
      weight: 0
  • sources:
    • model: 152334H/miqu-1-70b-sf
      layer_range: [1, 20]

...

  • sources:
    • model: 152334H/miqu-1-70b-sf
      layer_range: [79, 80]
    • model: lizpreciatior/lzlv_70b_fp16_hf
      layer_range: [79, 80]
      parameters:
      weight: 0

So, I think you want to use linear interpolation between the models on the first and last layers. But then you set the weight to zero, which I think sets the linear model to just use the second model listed.

If I'm wrong here, could you explain whats happening?

Owner

It's a trick, it's not a linear merge, it's actually pass-through. Linear merging with a dummy second model with 0 weight is just used so the tokenizer-based merge routine is invoked for embed_tokens, which wouldn't happen with regular pass-through merging. It's a brilliant idea by Eric Hartford and friends, as explained in the merge config comments of TheProfessor-155b.

Sign up or log in to comment