File size: 3,126 Bytes
847d0ab
5f44195
 
 
847d0ab
 
 
5f44195
 
 
 
 
 
847d0ab
5f44195
 
 
 
 
 
 
 
 
 
 
 
 
ab64506
 
 
 
 
 
 
 
2dc3d95
ab64506
 
 
 
 
 
 
 
2dc3d95
ab64506
5f44195
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
---
base_model:
- Qwen/Qwen1.5-72B-Chat
- abacusai/Liberated-Qwen1.5-72B
license: other
license_name: tongyi-qianwen
license_link: https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE
language:
- en
tags:
- mergekit
- merge
- moe
---
# Qwen1.5-MoE-2x72B

## Description
This model is created using MoE (Mixture of Experts) through mergekit based on [Qwen/Qwen1.5-72B-Chat](https://huggingface.co/Qwen/Qwen1.5-72B-Chat) and [abacusai/Liberated-Qwen1.5-72B](https://huggingface.co/abacusai/Liberated-Qwen1.5-72B) without further FT.

It utilizes a customized script for MoE via mergekit, which is available [here](https://github.com/Aratako/mergekit-qwen2).

Due to the structural modifications introduced by MoE, the use of this model requires [custom modeling file](https://huggingface.co/Aratako/Liberated-Qwen1.5-2x72B/blob/main/modeling_qwen2.py) and [custom configuration file](https://huggingface.co/Aratako/Liberated-Qwen1.5-2x72B/blob/main/configuration_qwen2.py).
When using the model, please place these files in the same folder as the model.

This model inherits the the [tongyi-qianwen license](https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE).

## Benchmark
The benchmark score of the [mt-bench](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) for this model and the two base models are as follows:

**1-turn, 4-bit quantization**
|Model|Size|Coding|Extraction|Humanities|Math|Reasoning|Roleplay|STEM|Writing|avg_score|
|---|---|---|---|---|---|---|---|---|---|---|
| Liberated-Qwen1.5-72B | 72B | **5.8** | 7.9 | 9.6 | 6.7 | 7.0 | **9.05** | 9.55 | **9.9** | 8.1875 |
| Qwen1.5-72B-Chat | 72B | 5.5 | **8.7** | 9.7 | **8.4** | 7.5 | 9.0 | 9.45 | 9.75 | **8.5000** |
| This model  | 2x72B | 5.6 | 7.8 | **9.75** | 7.0 | **8.1** | 9.0 | **9.65** | 9.8 | 8.3375 |

![mt-bench-1turn](./mt-bench-1turn.png)

**2-turn, 4-bit quantization**
|Model|Size|Coding|Extraction|Humanities|Math|Reasoning|Roleplay|STEM|Writing|avg_score|
|---|---|---|---|---|---|---|---|---|---|---|
| Liberated-Qwen1.5-72B | 72B | 3.9 | 8.2 | **10.0** | 5.7 | 5.5 | 8.4 | 8.7 | 8.6 | 7.3750 |
| Qwen1.5-72B-Chat | 72B | **5.2** | 8.8 | **10.0** | **6.1** | 6.7 | 9.0 | **9.8** | **9.5** | 8.1375 |
| This model  | 2x72B | 5.0 | **9.5** | 9.9 | 5.6 | **8.1** | **9.3** | 9.6 | 9.2 | **8.2750** |

![mt-bench-2turn](./mt-bench-2turn.png)

## Merge config
[mergekit_config.yml](./mergekit_moe_config.yml)
```yaml
base_model: ./Qwen1.5-72B-Chat
gate_mode: random
dtype: bfloat16
experts:
  - source_model: ./Qwen1.5-72B-Chat
    positive_prompts: []
  - source_model: ./Liberated-Qwen1.5-72B
    positive_prompts: []
tokenizer_source: model:./Qwen1.5-72B-Chat
```

## Gratitude
- Huge thanks to [Alibaba Cloud Qwen](https://www.alibabacloud.com/solutions/generative-ai/qwen) for training and publishing the weights of Qwen model
- Thank you to [abacusai](https://huggingface.co/abacusai) for publishing fine-tuned model from Qwen
- And huge thanks to [mlabonne](https://huggingface.co/mlabonne), as I customized modeling file using [phixtral](https://huggingface.co/mlabonne/phixtral-4x2_8) as a reference