Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
pipeline_tag: text-generation
|
6 |
+
tags:
|
7 |
+
- llama
|
8 |
+
- open-llama
|
9 |
+
- mpt
|
10 |
+
- model-fusion
|
11 |
+
---
|
12 |
+
<p align="center" width="100%">
|
13 |
+
</p>
|
14 |
+
|
15 |
+
<div id="top" align="center">
|
16 |
+
|
17 |
+
_**Knowledge Fusion of Large Language Models**_
|
18 |
+
|
19 |
+
<h4> |<a href="https://arxiv.org/abs/xxxx.xxxxx"> 📑 Paper </a> |
|
20 |
+
<a href="https://huggingface.co/datasets/Wanfq/FuseLLM-7B"> 🤗 Model </a> |
|
21 |
+
<a href="https://github.com/fanqiwan/FuseLLM"> 🐱 Github Repo </a> |
|
22 |
+
</h4>
|
23 |
+
|
24 |
+
<!-- **Authors:** -->
|
25 |
+
|
26 |
+
_**Fanqi Wan<sup>†</sup>, Xinting Huang<sup>‡</sup>, Deng Cai<sup>‡</sup>, Xiaojun Quan<sup>†</sup>, Wei Bi<sup>‡</sup>, Shuming Shi<sup>‡</sup>**_
|
27 |
+
|
28 |
+
|
29 |
+
<!-- **Affiliations:** -->
|
30 |
+
|
31 |
+
|
32 |
+
_<sup>†</sup> Sun Yat-sen University,
|
33 |
+
<sup>‡</sup> Tencent AI Lab_
|
34 |
+
|
35 |
+
</div>
|
36 |
+
|
37 |
+
|
38 |
+
## News
|
39 |
+
- **Jan 22, 2024:** 🔥 We're excited to announce that the FuseLLM-7B, which is the fusion of [Llama-2-7B](https://huggingface.co/meta-llama/Llama-2-7b-hf), [OpenLLaMA-7B](https://huggingface.co/openlm-research/open_llama_7b_v2), and [MPT-7B](https://huggingface.co/mosaicml/mpt-7b), is now available on 🤗 [Huggingface Models](https://huggingface.co/Wanfq/FuseLLM-7B). Happy exploring!
|
40 |
+
|
41 |
+
## Contents
|
42 |
+
|
43 |
+
- [Overview](#overview)
|
44 |
+
- [Model Release](#model-release)
|
45 |
+
- [Citation](#citation)
|
46 |
+
- [Acknowledgements](#acknowledgments)
|
47 |
+
|
48 |
+
## Overview
|
49 |
+
|
50 |
+
In this study, we explore the realm of knowledge fusion for LLMs to create a unified model that combines the capabilities and distinctive strengths of multiple structurally diverse LLMs. To achieve this, we introduce FuseLLM, which first leverages the generative distributions of these source LLMs to externalize both their collective knowledge and individual strengths, and subsequently transfer them to the target LLM through lightweight continual training.
|
51 |
+
|
52 |
+
Compared with model ensemble which requires the parallel deployment of multiple LLMs or weight merging which is generally limited to LLMs with identical architectures, FuseLLM supports the fusion of multiple LLMs with **diverse architectures** by explicitly transferring their knowledge and capabilities to a **single** target LLM.
|
53 |
+
|
54 |
+
<p align="center">
|
55 |
+
<img src="./assets/fig_1.png" width="95%"> <br>
|
56 |
+
</p>
|
57 |
+
|
58 |
+
|
59 |
+
## Model Release
|
60 |
+
|
61 |
+
We release the FuseLLM-7B on [Huggingface Models](https://huggingface.co/models?sort=trending&search=FuseLLM), which is the fusion of three popular open-source LLMs that possess distinct architectures and functionalities: [Llama-2-7B](https://huggingface.co/meta-llama/Llama-2-7b-hf), [OpenLLaMA-7B](https://huggingface.co/openlm-research/open_llama_7b_v2), and [MPT-7B](https://huggingface.co/mosaicml/mpt-7b).
|
62 |
+
|
63 |
+
Evaluations across three benchmarks, which consist of a total of **42** tasks spanning reasoning, commonsense, and code generation, confirm that the target model trained by our method outperforms each source LLM and the casual language model baseline in most tasks.
|
64 |
+
|
65 |
+
<p align="center">
|
66 |
+
<img src="./assets/fig_2.png" width="95%"> <br>
|
67 |
+
</p>
|
68 |
+
|
69 |
+
To further illustrate the effectiveness of FuseLLM, we incorporate additional generative benchmarks related to knowledge-based question-answering, reading comprehension, content analysis, machine translation, and theorem application. The results highlight FuseLLM’s superiority over all source LLMs and the baseline.
|
70 |
+
|
71 |
+
<p align="center">
|
72 |
+
<img src="./assets/fig_3.png" width="95%"> <br>
|
73 |
+
</p>
|
74 |
+
|
75 |
+
Since FuseLLM is also applicable to instruction-tuning models, we assess the instruction-following performance on the Vicuna Benchmark using GPT-4 as an evaluator. The results demonstrate that FuseLLM surpasses each individual source instruction-tuning LLM and the baseline , achieving the best performance with GPT-4 judgment.
|
76 |
+
|
77 |
+
<p align="center">
|
78 |
+
<img src="./assets/fig_4.png" width="50%"> <br>
|
79 |
+
</p>
|
80 |
+
|
81 |
+
|
82 |
+
## Citation
|
83 |
+
|
84 |
+
If you find this work is relevant with your research or applications, please feel free to cite our work!
|
85 |
+
```
|
86 |
+
@misc{wan2024knowledge,
|
87 |
+
title={Knowledge Fusion of Large Language Models},
|
88 |
+
author={Fanqi, Wan and Xinting, Huang and Deng, Cai and Xiaojun, Quan and Wei, Bi and Shuming, Shi},
|
89 |
+
year={2024},
|
90 |
+
eprint={xxxx.xxxxx},
|
91 |
+
archivePrefix={arXiv},
|
92 |
+
primaryClass={cs.CL}
|
93 |
+
}
|
94 |
+
```
|
95 |
+
|
96 |
+
## Acknowledgments
|
97 |
+
|
98 |
+
This repo benefits from [Stanford-Alpaca](https://github.com/tatsu-lab/stanford_alpaca) and [Explore-Instruct](https://github.com/fanqiwan/Explore-Instruct). Thanks for their wonderful works!
|