Papers
arxiv:2403.13187

Evolutionary Optimization of Model Merging Recipes

Published on Mar 19
· Featured in Daily Papers on Mar 21

Abstract

We present a novel application of evolutionary algorithms to automate the creation of powerful foundation models. While model merging has emerged as a promising approach for LLM development due to its cost-effectiveness, it currently relies on human intuition and domain knowledge, limiting its potential. Here, we propose an evolutionary approach that overcomes this limitation by automatically discovering effective combinations of diverse open-source models, harnessing their collective intelligence without requiring extensive additional training data or compute. Our approach operates in both parameter space and data flow space, allowing for optimization beyond just the weights of the individual models. This approach even facilitates cross-domain merging, generating models like a Japanese LLM with Math reasoning capabilities. Surprisingly, our Japanese Math LLM achieved state-of-the-art performance on a variety of established Japanese LLM benchmarks, even surpassing models with significantly more parameters, despite not being explicitly trained for such tasks. Furthermore, a culturally-aware Japanese VLM generated through our approach demonstrates its effectiveness in describing Japanese culture-specific content, outperforming previous Japanese VLMs. This work not only contributes new state-of-the-art models back to the open-source community, but also introduces a new paradigm for automated model composition, paving the way for exploring alternative, efficient approaches to foundation model development.

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Awesome work! I think this is a fantastic direction.

I was wondering about why only 3 models were used in the experiment, and all of them Mistrals. Perhaps it's the T=Mxr approach in DFS merge, which is also responsible for increasing the final model size with the number of models.

To me the most obvious DFS merge would be to mix models by picking layer 1 from one model, then layer 2 from one model, then layer 3 from one model, etc. With this method the final model is always the same size as the source models. I guess this doesn't work well enough for some reason?

Or even if we really want the option to increase the number of layers, why not do T=rxM — start with layer 1 from all models, then layer 2 from all models, then layer 3 from all models. (Go layer by layer instead of model by model.) I figure we rarely want to put layer 1 of some model after layer 10 of another. Or do we? It would be great to see the final architecture of EvoLLM-JP-v1-7B. Which layers from which models ended up mixed where? (Maybe it's in the paper and I missed it.)

Sign up or log in to comment

Models citing this paper 5

Browse 5 models citing this paper

Datasets citing this paper 2

Spaces citing this paper 3

Collections including this paper 23