Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,46 @@
|
|
1 |
---
|
|
|
|
|
|
|
2 |
license: other
|
3 |
license_name: tongyi-qianwen
|
4 |
license_link: https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
base_model:
|
3 |
+
- Qwen/Qwen1.5-72B-Chat
|
4 |
+
- abacusai/Liberated-Qwen1.5-72B
|
5 |
license: other
|
6 |
license_name: tongyi-qianwen
|
7 |
license_link: https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE
|
8 |
+
language:
|
9 |
+
- en
|
10 |
+
tags:
|
11 |
+
- mergekit
|
12 |
+
- merge
|
13 |
+
- moe
|
14 |
---
|
15 |
+
# Qwen1.5-MoE-2x72B
|
16 |
+
|
17 |
+
## Description
|
18 |
+
This model is created using MoE (Mixture of Experts) through mergekit based on [Qwen/Qwen1.5-72B-Chat](https://huggingface.co/Qwen/Qwen1.5-72B-Chat) and [abacusai/Liberated-Qwen1.5-72B](https://huggingface.co/abacusai/Liberated-Qwen1.5-72B) without further FT.
|
19 |
+
|
20 |
+
It utilizes a customized script for MoE via mergekit, which is available [here](https://github.com/Aratako/mergekit-qwen2).
|
21 |
+
|
22 |
+
Due to the structural modifications introduced by MoE, the use of this model requires [custom modeling file](https://huggingface.co/Aratako/Liberated-Qwen1.5-2x72B/blob/main/modeling_qwen2.py) and [custom configuration file](https://huggingface.co/Aratako/Liberated-Qwen1.5-2x72B/blob/main/configuration_qwen2.py).
|
23 |
+
When using the model, please place these files in the same folder as the model.
|
24 |
+
|
25 |
+
This model inherits the the [tongyi-qianwen license](https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE).
|
26 |
+
|
27 |
+
## Benchmark
|
28 |
+
|
29 |
+
## Merge config
|
30 |
+
[mergekit_config.yml](./mergekit_moe_config.yml)
|
31 |
+
```yaml
|
32 |
+
base_model: ./Qwen1.5-72B-Chat
|
33 |
+
gate_mode: random
|
34 |
+
dtype: bfloat16
|
35 |
+
experts:
|
36 |
+
- source_model: ./Qwen1.5-72B-Chat
|
37 |
+
positive_prompts: []
|
38 |
+
- source_model: ./Liberated-Qwen1.5-72B
|
39 |
+
positive_prompts: []
|
40 |
+
tokenizer_source: model:./Qwen1.5-72B-Chat
|
41 |
+
```
|
42 |
+
|
43 |
+
## Gratitude
|
44 |
+
- Huge thanks to [Alibaba Cloud Qwen](https://www.alibabacloud.com/solutions/generative-ai/qwen) for training and publishing the weights of Qwen model
|
45 |
+
- Thank you to [abacusai](https://huggingface.co/abacusai) for publishing fine-tuned model from Qwen
|
46 |
+
- And huge thanks to [mlabonne](https://huggingface.co/mlabonne), as I customized modeling file using [phixtral](https://huggingface.co/mlabonne/phixtral-4x2_8) as a reference
|