Aratako
/

Qwen1.5-MoE-2x72B

Text Generation

Mixture of Experts

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Aratako commited on Mar 25

Commit

5f44195

•

1 Parent(s): 1210357

Update README.md

Files changed (1) hide show

README.md +41 -0

README.md CHANGED Viewed

@@ -1,5 +1,46 @@
 ---
 license: other
 license_name: tongyi-qianwen
 license_link: https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE
 ---

 ---
+base_model:
+- Qwen/Qwen1.5-72B-Chat
+- abacusai/Liberated-Qwen1.5-72B
 license: other
 license_name: tongyi-qianwen
 license_link: https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE
+language:
+- en
+tags:
+- mergekit
+- merge
+- moe
 ---
+# Qwen1.5-MoE-2x72B
+## Description
+This model is created using MoE (Mixture of Experts) through mergekit based on [Qwen/Qwen1.5-72B-Chat](https://huggingface.co/Qwen/Qwen1.5-72B-Chat) and [abacusai/Liberated-Qwen1.5-72B](https://huggingface.co/abacusai/Liberated-Qwen1.5-72B) without further FT.
+It utilizes a customized script for MoE via mergekit, which is available [here](https://github.com/Aratako/mergekit-qwen2).
+Due to the structural modifications introduced by MoE, the use of this model requires [custom modeling file](https://huggingface.co/Aratako/Liberated-Qwen1.5-2x72B/blob/main/modeling_qwen2.py) and [custom configuration file](https://huggingface.co/Aratako/Liberated-Qwen1.5-2x72B/blob/main/configuration_qwen2.py).
+When using the model, please place these files in the same folder as the model.
+This model inherits the the [tongyi-qianwen license](https://huggingface.co/Qwen/Qwen1.5-72B-Chat/blob/main/LICENSE).
+## Benchmark
+## Merge config
+[mergekit_config.yml](./mergekit_moe_config.yml)
+```yaml
+base_model: ./Qwen1.5-72B-Chat
+gate_mode: random
+dtype: bfloat16
+experts:
+  - source_model: ./Qwen1.5-72B-Chat
+    positive_prompts: []
+  - source_model: ./Liberated-Qwen1.5-72B
+    positive_prompts: []
+tokenizer_source: model:./Qwen1.5-72B-Chat
+```
+## Gratitude
+- Huge thanks to [Alibaba Cloud Qwen](https://www.alibabacloud.com/solutions/generative-ai/qwen) for training and publishing the weights of Qwen model
+- Thank you to [abacusai](https://huggingface.co/abacusai) for publishing fine-tuned model from Qwen
+- And huge thanks to [mlabonne](https://huggingface.co/mlabonne), as I customized modeling file using [phixtral](https://huggingface.co/mlabonne/phixtral-4x2_8) as a reference