Aratako commited on
Commit
5f44195
1 Parent(s): 1210357

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md CHANGED
@@ -1,5 +1,46 @@
1
  ---
 
 
 
2
  license: other
3
  license_name: tongyi-qianwen
4
  license_link: https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE
 
 
 
 
 
 
5
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen1.5-72B-Chat
4
+ - abacusai/Liberated-Qwen1.5-72B
5
  license: other
6
  license_name: tongyi-qianwen
7
  license_link: https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE
8
+ language:
9
+ - en
10
+ tags:
11
+ - mergekit
12
+ - merge
13
+ - moe
14
  ---
15
+ # Qwen1.5-MoE-2x72B
16
+
17
+ ## Description
18
+ This model is created using MoE (Mixture of Experts) through mergekit based on [Qwen/Qwen1.5-72B-Chat](https://huggingface.co/Qwen/Qwen1.5-72B-Chat) and [abacusai/Liberated-Qwen1.5-72B](https://huggingface.co/abacusai/Liberated-Qwen1.5-72B) without further FT.
19
+
20
+ It utilizes a customized script for MoE via mergekit, which is available [here](https://github.com/Aratako/mergekit-qwen2).
21
+
22
+ Due to the structural modifications introduced by MoE, the use of this model requires [custom modeling file](https://huggingface.co/Aratako/Liberated-Qwen1.5-2x72B/blob/main/modeling_qwen2.py) and [custom configuration file](https://huggingface.co/Aratako/Liberated-Qwen1.5-2x72B/blob/main/configuration_qwen2.py).
23
+ When using the model, please place these files in the same folder as the model.
24
+
25
+ This model inherits the the [tongyi-qianwen license](https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE).
26
+
27
+ ## Benchmark
28
+
29
+ ## Merge config
30
+ [mergekit_config.yml](./mergekit_moe_config.yml)
31
+ ```yaml
32
+ base_model: ./Qwen1.5-72B-Chat
33
+ gate_mode: random
34
+ dtype: bfloat16
35
+ experts:
36
+ - source_model: ./Qwen1.5-72B-Chat
37
+ positive_prompts: []
38
+ - source_model: ./Liberated-Qwen1.5-72B
39
+ positive_prompts: []
40
+ tokenizer_source: model:./Qwen1.5-72B-Chat
41
+ ```
42
+
43
+ ## Gratitude
44
+ - Huge thanks to [Alibaba Cloud Qwen](https://www.alibabacloud.com/solutions/generative-ai/qwen) for training and publishing the weights of Qwen model
45
+ - Thank you to [abacusai](https://huggingface.co/abacusai) for publishing fine-tuned model from Qwen
46
+ - And huge thanks to [mlabonne](https://huggingface.co/mlabonne), as I customized modeling file using [phixtral](https://huggingface.co/mlabonne/phixtral-4x2_8) as a reference