|
--- |
|
license: other |
|
--- |
|
The Moe model built on top of Qwen1.5-7B-Chat, Qwen1.5-7B and Crystalcareai/CrystalQwen-1.5-7B, Then qlora was applied to all layers of q,v, and gate linear on WizardLM_evol_instruct_70k via mlx. |
|
The model was created using a script from https://github.com/mzbac/mlx-moe |
|
|
|
## Evaluation |
|
|
|
**Qwen-1_5-2x3-hf** |
|
|
|
*MMLU* |
|
|
|
| Groups |Version|Filter|n-shot|Metric|Value | |Stderr| |
|
|------------------|-------|------|-----:|------|-----:|---|-----:| |
|
| - humanities |N/A |none | 0|acc |0.6488|± |0.0237| |
|
| - other |N/A |none | 0|acc |0.6294|± |0.0302| |
|
| - social_sciences|N/A |none | 0|acc |0.6905|± |0.0281| |
|
| - stem |N/A |none | 0|acc |0.5227|± |0.0375| |
|
|
|
*CMMLU* |
|
|
|
|Groups|Version|Filter|n-shot| Metric |Value | |Stderr| |
|
|------|-------|------|-----:|--------|-----:|---|-----:| |
|
|cmmlu |N/A |none | 0|acc |0.6966|± |0.0333| |
|
| | |none | 0|acc_norm|0.6966|± |0.0333| |
|
|
|
*GSM8K* |
|
|
|
|Tasks|Version| Filter |n-shot| Metric |Value | |Stderr| |
|
|-----|------:|----------|-----:|-----------|-----:|---|-----:| |
|
|gsm8k| 2|get-answer| 5|exact_match|0.4102|± |0.0135| |
|
|
|
**Qwen1.5-7B-Chat** |
|
|
|
*MMLU* |
|
|
|
| Groups |Version|Filter|n-shot|Metric|Value | |Stderr| |
|
|------------------|-------|------|-----:|------|-----:|---|-----:| |
|
| - humanities |N/A |none | 0|acc |0.6533|± |0.0239| |
|
| - other |N/A |none | 0|acc |0.6321|± |0.0301| |
|
| - social_sciences|N/A |none | 0|acc |0.6934|± |0.0282| |
|
| - stem |N/A |none | 0|acc |0.5329|± |0.0376| |
|
|
|
*CMMLU* |
|
|
|
|Groups|Version|Filter|n-shot| Metric |Value | |Stderr| |
|
|------|-------|------|-----:|--------|-----:|---|-----:| |
|
|cmmlu |N/A |none | 0|acc |0.6879|± |0.0338| |
|
| | |none | 0|acc_norm|0.6879|± |0.0338| |
|
|
|
GSM8K |
|
|
|
|Tasks|Version| Filter |n-shot| Metric |Value | |Stderr| |
|
|-----|------:|----------|-----:|-----------|-----:|---|-----:| |
|
|gsm8k| 2|get-answer| 5|exact_match|0.0425|± |0.0056| |
|
|
|
``` |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
import torch |
|
|
|
model_id = "mzbac/qwen-1.5-2x3-hf" |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_id, |
|
torch_dtype=torch.bfloat16, |
|
load_in_4bit=True, |
|
trust_remote_code=True, |
|
) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
|
chat = [ |
|
{"role": "user", "content": "how backpropagation works?"}, |
|
{"role": "assistant", "content": "\n"}, |
|
] |
|
|
|
text = tokenizer.apply_chat_template(chat, tokenize=False) |
|
|
|
inputs = tokenizer.encode(text, return_tensors="pt").to("cuda") |
|
|
|
generate_kwargs = dict( |
|
input_ids=inputs, |
|
temperature=0.6, |
|
max_new_tokens=500, |
|
do_sample=True, |
|
) |
|
|
|
outputs = model.generate(**generate_kwargs) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |