Model merging
Training a model for each task can be costly, take up storage space, and the models aren’t able to learn new information to improve their performance. Multitask learning can overcome some of these limitations by training a model to learn several tasks, but it is expensive to train and designing a dataset for it is challenging. Model merging offers a solution to these challenges by combining multiple pretrained models into one model, giving it the combined abilities of each individual model without any additional training.
PEFT provides several methods for merging models like a linear or SVD combination. This guide focuses on two methods that are more efficient for merging LoRA adapters by eliminating redundant parameters:
- TIES - TrIm, Elect, and Merge (TIES) is a three-step method for merging models. First, redundant parameters are trimmed, then conflicting signs are resolved into an aggregated vector, and finally the parameters whose signs are the same as the aggregate sign are averaged. This method takes into account that some values (redundant and sign disagreement) can degrade performance in the merged model.
- DARE - Drop And REscale is a method that can be used to prepare for other model merging methods like TIES. It works by randomly dropping parameters according to a drop rate and rescaling the remaining parameters. This helps to reduce the number of redundant and potentially interfering parameters among multiple models.
Models are merged with the add_weighted_adapter() method, and the specific model merging method is specified in the combination_type
parameter.
Merge method
With TIES and DARE, merging is enabled by setting combination_type
and density
to a value of the weights to keep from the individual models. For example, let’s merge three finetuned TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T models: tinyllama_lora_nobots, tinyllama_lora_sql, and tinyllama_lora_adcopy.
When you’re attempting to merge fully trained models with TIES, you should be aware of any special tokens each model may have added to the embedding layer which are not a part of the original checkpoint’s vocabulary. This may cause an issue because each model may have added a special token to the same embedding position. If this is the case, you should use the resize_token_embeddings method to avoid merging the special tokens at the same embedding index.
This shouldn’t be an issue if you’re only merging LoRA adapters trained from the same base model.
Load a base model and can use the load_adapter() method to load and assign each adapter a name:
from peft import PeftConfig, PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
config = PeftConfig.from_pretrained("smangrul/tinyllama_lora_norobots")
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, load_in_4bit=True, device_map="auto").eval()
tokenizer = AutoTokenizer.from_pretrained("smangrul/tinyllama_lora_norobots")
model = PeftModel.from_pretrained(model, "smangrul/tinyllama_lora_norobots", adapter_name="norobots")
_ = model.load_adapter("smangrul/tinyllama_lora_sql", adapter_name="sql")
_ = model.load_adapter("smangrul/tinyllama_lora_adcopy", adapter_name="adcopy")
Set the adapters, weights, adapter_name
, combination_type
, and density
with the add_weighted_adapter() method.
Weight values greater than 1.0
typically produce better results because they preserve the correct scale. A good default starting value for the weights is to set all values to 1.0
.
adapters = ["norobots", "adcopy", "sql"]
weights = [2.0, 1.0, 1.0]
adapter_name = "merge"
density = 0.2
model.add_weighted_adapter(adapters, weights, adapter_name, combination_type="ties", density=density)
Set the newly merged model as the active model with the set_adapter() method.
model.set_adapter("merge")
Now you can use the merged model as an instruction-tuned model to write ad copy or SQL queries!
messages = [
{"role": "user", "content": "Write an essay about Generative AI."},
]
text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(text, return_tensors="pt")
inputs = {k: v.to("cuda") for k, v in inputs.items()}
outputs = model.generate(**inputs, max_new_tokens=256, do_sample=True, top_p=0.95, temperature=0.2, repetition_penalty=1.2, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0]))