How do you merge models

by deleted - opened Aug 11, 2023

Discussion

deleted

Aug 11, 2023

Can you provide information about merging models?

waleking

Aug 11, 2023

Did you merge garage-bAInd/Platypus2-70B's LoRA with upstage/Llama-2-70b-instruct-v2?

arielnlee

garage-bAInd org Aug 12, 2023

•

edited Aug 12, 2023

Thanks for your interest. It is a simple linear merge (for now...stay tuned). We played around with the different types of LoRA modules, how the training data affects the outcome of the merge, how merging fine-tunes that used different LoRA modules works, etc.

From our experience, the outcome of merging two (or more) LoRA based models is very much dependent on 1) the LoRA modules both merged models were fine-tuned with (i.e. did one model use up/down/gate proj and the other k/v/q/o proj 2) the training data, 3) the performance of both original models on whatever benchmarks you're using, and 4) (I think, but am still working on quantitative tests to explore this) the order of the LoRA merge. I believe the order of the merge also affects the "expertise" of the model.

Edit: Since this is additive, one would think order shouldn't matter (which is why it is not discussed in the paper we recently released). I only started looking into it because when we originally merged Platypus-70B with Dolphin, it was the only merge we had at the time that actually did worse than its original counterpart (the rest of our merges were better than both originals). If you're interested, follow-up with me in a week and hopefully I'll have additional insight and quantitative experiments to share! ☺️

lauspectrum

Aug 14, 2023

I'm curious about if the kind of "merge" you mentioned means that you trained a LoRA adapter on "one" base model and applied on "another". I would be extremely grateful if I could receive your reply. : )