Edit model card

WizardLM-2-4x7B-MoE

WizardLM-2-4x7B-MoE is an experimental MoE model made with Mergekit. It was made by combining four WizardLM-2-7B models using the random gate mode.

Please be sure to set experts per token to 4 for the best results! Context length should be the same as Mistral-7B-Instruct-v0.1 (8k tokens). For instruction templates, Vicuna-v1.1 is recommended.

Quanitized versions

EXL2 (for fast GPU-only inference):
8_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-8_0bpw (~ 25 GB vram)
6_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-6_0bpw (~ 19 GB vram)
5_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-5_0bpw (~ 16 GB vram)
4_25bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-4_25bpw (~ 14 GB vram)
3_5bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-3_5bpw (~ 12 GB vram)
3_0bpw: https://huggingface.co/Skylaude/WizardLM-2-4x7B-MoE-exl2-3_0bpw (~ 11 GB vram)

GGUF (for mixed GPU+CPU inference or CPU-only inference):
https://huggingface.co/mradermacher/WizardLM-2-4x7B-MoE-GGUF
Thanks to Michael Radermacher for making these quants!

Evaluation

I don't expect this model to be that great since it's something that I made as an experiment. However, I will submit it to the Open LLM Leaderboard to see how it matches up against some other models (particularly WizardLM-2-7B and WizardLM-2-70B).

Mergekit config

base_model: models/WizardLM-2-7B
gate_mode: random
dtype: float16
experts_per_token: 4
experts:
  - source_model: models/WizardLM-2-7B
  - source_model: models/WizardLM-2-7B
  - source_model: models/WizardLM-2-7B
  - source_model: models/WizardLM-2-7B
Downloads last month
190
Safetensors
Model size
24.2B params
Tensor type
FP16
·
Inference API
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.