Model merging techniques

by brianflakes - opened Feb 20, 2023

Feb 20, 2023

In the space of stable diffusion models, model merging seems to be super common. There are two techniques that I've seen:

Weighted average (given A and B are two models with identical shapes)
output = (1-α)A + αB
Added diff (given A, B are models finetuned from C)
output = A + α(B - C)

The intuition I've gathered is that weighted average is akin to mixing the datasets, while added diff is more like additional finetuning. I'm not sure if there's any research out there that takes a properly analysis, but this is what I've heard and understood. Obviously both methods raise eyebrows, but it's cool that they can sometimes work.

The second technique has, with mixed results, enabled the transfer of inpainting capabilities to other models without forgetting.

I'm assuming that ppo_hh_gpt-j was finetuned on GPT-J. Since the two models you've averaged are finetuned off of the same baseline, have you considered the second formula above?

Let me know what you think, or if there's any research you know of that I could study up on.

digitous

Owner Feb 21, 2023

I've been curious as well if difference merge is possible with language models. Frankly I'm floored weight merging is possible in this space. I'm excited to find out what else is possible. I may eyeball Automatic1111's difference merge script and see what might be possible to retrofit into the merge script I used to make this model.

Caveat, I'm new to Python, capable of augmenting or heavily modifying the work of others but at the moment I'm babby. I'm not the original author of the weight average script, although I have converted it to a Colab environment and plan to officially release it in a week or two. The KoboldAI Discord has the original .py under the model merging channel.

The author of the merge script I used is the work of Concedo: https://huggingface.co/concedo

brianflakes

Feb 21, 2023

Thanks for the info. I'll work on modifying that merge.py script to support difference merge. Here's hoping it works.

brianflakes

Feb 21, 2023

for anyone who comes across this discussion:
https://gist.github.com/briansemrau/c68835edc88a0dea79b092bce4e1ee17

digitous

Owner Feb 21, 2023

•

edited Feb 21, 2023

Absolutely awesome, this opens even more doors for interesting tests with learned information transfer. I'm tinkering with a colab conversion right now. Will add you to the credits at the top.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment