Edit model card

Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)

News

Introduction

Camelidae and Qwen2idae models are trained utilizing Parameter-Efficient Sparsity Crafting techniques

We present Parameter-Efficient Sparsity Crafting to help dense models learn knowledge from different fields (including code and math). This approach performs instruction tuning and efficiently utilizes MoE structure.

Specifically, Parameter-Efficient Sparsity Crafting utilizes parameter-efficient techniques including QLoRA and Adapter to perform Efficient Sparse Upcycling.

Model Lists

Camelidae Series Download
Camelidae-8x7B 🤗 HuggingFace
Camelidae-8x13B 🤗 HuggingFace
Camelidae-8x34B 🤗 HuggingFace
Camelidae-8x34B-pro 🤗 Coming Soon
Qwen2idae Series Download
Qwen2idae-16x14B-v1.0 🤗 HuggingFace
Qwen2idae-16x7B-v1.0 🤗 Coming Soon
Qwen2idae-16x1.8B-v1.0 🤗 Coming Soon

Performance

Model Activated Params MMLU (5shot) GSM8k (5shot) MATH (4shot) HumanEval (0shot) MBPP (4shot) HellaSwag (10shot)
GPT3.5 - 70.0% 57.1% 34.1% 48.1% - 85.5%
LLaMA2-70B-chat 70B 63.8% 59.3% 10.4% 32.3% 35.6% 84.8%
Camelidae-8x34B-pro 35B 75.7% 79.4% 24.0% 48.8% 43.2% 85.2%
Camelidae-8x34B 35B 75.6% 78.3% 22.6% 43.9% 41.4% 85.3%
SUSChat-34B 34B 76.4% 72.3% 22.0% 11.6% 40.2% 83.9%
Yi-34B-chat 34B 74.8% 67.6% 17.3% 20.1% 41.0% 83.9%
Qwen2idae-16x14B-v1.0 15B 66.7% 77.8% 29.9% 62.8% 48.6% 82.3%
Mixtral-8x7B-instruct 14B 68.7% 71.7% 22.1% 25.6% 40.6% 86.5%
Camelidae-8x13B 13B 54.4% 52.6% 9.8% 30.6% 30.4% 82.5%
LLaMA2-13B-chat 13B 53.9% 37.1% 5.2% 18.9% 27.2% 81.9%
Camelidae-8x7B 7B 48.3% 44.0% 5.8% 18.3% 23.4% 79.2%
LLaMA2-7B-chat 7B 47.2% 26.3% 3.9% 12.2% 17.6% 78.6%

We bold the top3 scores separately for all models.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("hywu/Qwen2idae-16x14B-v1.0", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("hywu/Qwen2idae-16x14B-v1.0", device_map="auto", trust_remote_code=True).eval()

inputs = tokenizer('<|im_start|>user\nHow are you?<|im_end|>\n<|im_start|>assistant\n', return_tensors='pt')
inputs = inputs.to(model.device)
pred = model.generate(**inputs)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

Citation

@article{wu2024parameter,
  title={Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks},
  author={Wu, Haoyuan and Zheng, Haisheng and Yu, Bei},
  journal={arXiv preprint arXiv:2401.02731},
  year={2024}
}

License

The source code in this repo is licensed under the Apache 2.0 License. Qwen2idae models are developed for academic research and free commercial use, all usage must adhere to the license from Qwen1.5.

Downloads last month
9
Safetensors
Model size
17.5B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for hywu/Qwen2idae-16x14B-v1.0

Quantizations
1 model

Datasets used to train hywu/Qwen2idae-16x14B-v1.0