---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
tags:
- llama
- open-llama
- mpt
- model-fusion
---
_**Knowledge Fusion of Large Language Models**_
_**Fanqi Wan
†, Xinting Huang
‡, Deng Cai
‡, Xiaojun Quan
†, Wei Bi
‡, Shuming Shi
‡**_
_
† Sun Yat-sen University,
‡ Tencent AI Lab_
## News
- **Jan 22, 2024:** 🔥 We're excited to announce that the FuseLLM-7B, which is the fusion of [Llama-2-7B](https://huggingface.co/meta-llama/Llama-2-7b-hf), [OpenLLaMA-7B](https://huggingface.co/openlm-research/open_llama_7b_v2), and [MPT-7B](https://huggingface.co/mosaicml/mpt-7b), is now available on 🤗 [Huggingface Models](https://huggingface.co/Wanfq/FuseLLM-7B). Happy exploring!
## Contents
- [Overview](#overview)
- [Model Release](#model-release)
- [Citation](#citation)
- [Acknowledgements](#acknowledgments)
## Overview
In this study, we explore the realm of knowledge fusion for LLMs to create a unified model that combines the capabilities and distinctive strengths of multiple structurally diverse LLMs. To achieve this, we introduce FuseLLM, which first leverages the generative distributions of these source LLMs to externalize both their collective knowledge and individual strengths, and subsequently transfer them to the target LLM through lightweight continual training.
Compared with model ensemble which requires the parallel deployment of multiple LLMs or weight merging which is generally limited to LLMs with identical architectures, FuseLLM supports the fusion of multiple LLMs with **diverse architectures** by explicitly transferring their knowledge and capabilities to a **single** target LLM.
## Model Release
We release the FuseLLM-7B on [Huggingface Models](https://huggingface.co/models?sort=trending&search=FuseLLM), which is the fusion of three popular open-source LLMs that possess distinct architectures and functionalities: [Llama-2-7B](https://huggingface.co/meta-llama/Llama-2-7b-hf), [OpenLLaMA-7B](https://huggingface.co/openlm-research/open_llama_7b_v2), and [MPT-7B](https://huggingface.co/mosaicml/mpt-7b).
Evaluations across three benchmarks, which consist of a total of **42** tasks spanning reasoning, commonsense, and code generation, confirm that the target model trained by our method outperforms each source LLM and the casual language model baseline in most tasks.
To further illustrate the effectiveness of FuseLLM, we incorporate additional generative benchmarks related to knowledge-based question-answering, reading comprehension, content analysis, machine translation, and theorem application. The results highlight FuseLLM’s superiority over all source LLMs and the baseline.
Since FuseLLM is also applicable to instruction-tuning models, we assess the instruction-following performance on the Vicuna Benchmark using GPT-4 as an evaluator. The results demonstrate that FuseLLM surpasses each individual source instruction-tuning LLM and the baseline , achieving the best performance with GPT-4 judgment.
## Citation
If you find this work is relevant with your research or applications, please feel free to cite our work!
```
@misc{wan2024knowledge,
title={Knowledge Fusion of Large Language Models},
author={Fanqi, Wan and Xinting, Huang and Deng, Cai and Xiaojun, Quan and Wei, Bi and Shuming, Shi},
year={2024},
eprint={xxxx.xxxxx},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
## Acknowledgments
This repo benefits from [Stanford-Alpaca](https://github.com/tatsu-lab/stanford_alpaca) and [Explore-Instruct](https://github.com/fanqiwan/Explore-Instruct). Thanks for their wonderful works!