Abstract
We present a novel application of evolutionary algorithms to automate the creation of powerful foundation models. While model merging has emerged as a promising approach for LLM development due to its cost-effectiveness, it currently relies on human intuition and domain knowledge, limiting its potential. Here, we propose an evolutionary approach that overcomes this limitation by automatically discovering effective combinations of diverse open-source models, harnessing their collective intelligence without requiring extensive additional training data or compute. Our approach operates in both parameter space and data flow space, allowing for optimization beyond just the weights of the individual models. This approach even facilitates cross-domain merging, generating models like a Japanese LLM with Math reasoning capabilities. Surprisingly, our Japanese Math LLM achieved state-of-the-art performance on a variety of established Japanese LLM benchmarks, even surpassing models with significantly more parameters, despite not being explicitly trained for such tasks. Furthermore, a culturally-aware Japanese VLM generated through our approach demonstrates its effectiveness in describing Japanese culture-specific content, outperforming previous Japanese VLMs. This work not only contributes new state-of-the-art models back to the open-source community, but also introduces a new paradigm for automated model composition, paving the way for exploring alternative, efficient approaches to foundation model development.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- LLM Guided Evolution -- The Automation of Models Advancing Models (2024)
- Mipha: A Comprehensive Overhaul of Multimodal Assistant with Small Language Models (2024)
- MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language Models (2024)
- Distilling Mathematical Reasoning Capabilities into Small Language Models (2024)
- LM4OPT: Unveiling the Potential of Large Language Models in Formulating Mathematical Optimization Problems (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Awesome work! I think this is a fantastic direction.
I was wondering about why only 3 models were used in the experiment, and all of them Mistrals. Perhaps it's the T=Mxr approach in DFS merge, which is also responsible for increasing the final model size with the number of models.
To me the most obvious DFS merge would be to mix models by picking layer 1 from one model, then layer 2 from one model, then layer 3 from one model, etc. With this method the final model is always the same size as the source models. I guess this doesn't work well enough for some reason?
Or even if we really want the option to increase the number of layers, why not do T=rxM โ start with layer 1 from all models, then layer 2 from all models, then layer 3 from all models. (Go layer by layer instead of model by model.) I figure we rarely want to put layer 1 of some model after layer 10 of another. Or do we? It would be great to see the final architecture of EvoLLM-JP-v1-7B. Which layers from which models ended up mixed where? (Maybe it's in the paper and I missed it.)
Revolutionizing AI: Evolutionary Model Merging Explained!
Links ๐:
๐ Subscribe: https://www.youtube.com/@Arxflix
๐ Twitter: https://x.com/arxflix
๐ LMNT (Partner): https://lmnt.com/