arxiv:2403.13187

Evolutionary Optimization of Model Merging Recipes

Published on Mar 19, 2024

· Submitted by

akhaliq on Mar 21, 2024

#3 Paper of the day

Upvote

Authors:

Takuya Akiba ,

Makoto Shing ,

Yujin Tang ,

Qi Sun ,

David Ha

Abstract

We present a novel application of evolutionary algorithms to automate the creation of powerful foundation models. While model merging has emerged as a promising approach for LLM development due to its cost-effectiveness, it currently relies on human intuition and domain knowledge, limiting its potential. Here, we propose an evolutionary approach that overcomes this limitation by automatically discovering effective combinations of diverse open-source models, harnessing their collective intelligence without requiring extensive additional training data or compute. Our approach operates in both parameter space and data flow space, allowing for optimization beyond just the weights of the individual models. This approach even facilitates cross-domain merging, generating models like a Japanese LLM with Math reasoning capabilities. Surprisingly, our Japanese Math LLM achieved state-of-the-art performance on a variety of established Japanese LLM benchmarks, even surpassing models with significantly more parameters, despite not being explicitly trained for such tasks. Furthermore, a culturally-aware Japanese VLM generated through our approach demonstrates its effectiveness in describing Japanese culture-specific content, outperforming previous Japanese VLMs. This work not only contributes new state-of-the-art models back to the open-source community, but also introduces a new paradigm for automated model composition, paving the way for exploring alternative, efficient approaches to foundation model development.

View arXiv page View PDF Add to collection

Community

sugatoray

Mar 21, 2024

Release Blog: https://sakana.ai/evolutionary-model-merge/
GitHub: https://github.com/SakanaAI/evolutionary-model-merge
Official X-Post: https://x.com/sakanaailabs/status/1770613032198279663?s=46&t=KYwlyP7fHVw9cFnwZFzIag

librarian-bot

Mar 22, 2024

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

darabos-lynx

Mar 24, 2024

Awesome work! I think this is a fantastic direction.

I was wondering about why only 3 models were used in the experiment, and all of them Mistrals. Perhaps it's the T=Mxr approach in DFS merge, which is also responsible for increasing the final model size with the number of models.

To me the most obvious DFS merge would be to mix models by picking layer 1 from one model, then layer 2 from one model, then layer 3 from one model, etc. With this method the final model is always the same size as the source models. I guess this doesn't work well enough for some reason?

Or even if we really want the option to increase the number of layers, why not do T=rxM — start with layer 1 from all models, then layer 2 from all models, then layer 3 from all models. (Go layer by layer instead of model by model.) I figure we rarely want to put layer 1 of some model after layer 10 of another. Or do we? It would be great to see the final architecture of EvoLLM-JP-v1-7B. Which layers from which models ended up mixed where? (Maybe it's in the paper and I missed it.)