The Moe model was constructed using 4 microsoft/phi-2. Then qlora was applied to all linear layers on WizardLM_evol_instruct_70k via mlx. The model was created using a script from https://github.com/mzbac/mlx-moe
Evaluation
MMLU
mzbac/phi-2-2x4-hf
Groups | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
- humanities | N/A | none | 0 | acc | 0.5970 | ± | 0.0245 |
- other | N/A | none | 0 | acc | 0.5760 | ± | 0.0311 |
- social_sciences | N/A | none | 0 | acc | 0.6610 | ± | 0.0284 |
- stem | N/A | none | 0 | acc | 0.4738 | ± | 0.0379 |
microsoft/phi-2
Groups | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
- humanities | N/A | none | 0 | acc | 0.6026 | ± | 0.0243 |
- other | N/A | none | 0 | acc | 0.5827 | ± | 0.0310 |
- social_sciences | N/A | none | 0 | acc | 0.6440 | ± | 0.0289 |
- stem | N/A | none | 0 | acc | 0.4721 | ± | 0.0377 |
Example
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "mzbac/phi-2-2x4-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
text = "Instruct: how backpropagation works.\nOutput:"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))