qwen-1_5-7B-2x3-hf / README.md
mzbac's picture
Update README.md
758bdc0 verified
---
license: other
---
The Moe model built on top of Qwen1.5-7B-Chat, Qwen1.5-7B and Crystalcareai/CrystalQwen-1.5-7B, Then qlora was applied to all layers of q,v, and gate linear on WizardLM_evol_instruct_70k via mlx.
The model was created using a script from https://github.com/mzbac/mlx-moe
## Evaluation
**Qwen-1_5-2x3-hf**
*MMLU*
| Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
|------------------|-------|------|-----:|------|-----:|---|-----:|
| - humanities |N/A |none | 0|acc |0.6488|± |0.0237|
| - other |N/A |none | 0|acc |0.6294|± |0.0302|
| - social_sciences|N/A |none | 0|acc |0.6905|± |0.0281|
| - stem |N/A |none | 0|acc |0.5227|± |0.0375|
*CMMLU*
|Groups|Version|Filter|n-shot| Metric |Value | |Stderr|
|------|-------|------|-----:|--------|-----:|---|-----:|
|cmmlu |N/A |none | 0|acc |0.6966|± |0.0333|
| | |none | 0|acc_norm|0.6966|± |0.0333|
*GSM8K*
|Tasks|Version| Filter |n-shot| Metric |Value | |Stderr|
|-----|------:|----------|-----:|-----------|-----:|---|-----:|
|gsm8k| 2|get-answer| 5|exact_match|0.4102|± |0.0135|
**Qwen1.5-7B-Chat**
*MMLU*
| Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
|------------------|-------|------|-----:|------|-----:|---|-----:|
| - humanities |N/A |none | 0|acc |0.6533|± |0.0239|
| - other |N/A |none | 0|acc |0.6321|± |0.0301|
| - social_sciences|N/A |none | 0|acc |0.6934|± |0.0282|
| - stem |N/A |none | 0|acc |0.5329|± |0.0376|
*CMMLU*
|Groups|Version|Filter|n-shot| Metric |Value | |Stderr|
|------|-------|------|-----:|--------|-----:|---|-----:|
|cmmlu |N/A |none | 0|acc |0.6879|± |0.0338|
| | |none | 0|acc_norm|0.6879|± |0.0338|
GSM8K
|Tasks|Version| Filter |n-shot| Metric |Value | |Stderr|
|-----|------:|----------|-----:|-----------|-----:|---|-----:|
|gsm8k| 2|get-answer| 5|exact_match|0.0425|± |0.0056|
```
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "mzbac/qwen-1.5-2x3-hf"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
load_in_4bit=True,
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
chat = [
{"role": "user", "content": "how backpropagation works?"},
{"role": "assistant", "content": "\n"},
]
text = tokenizer.apply_chat_template(chat, tokenize=False)
inputs = tokenizer.encode(text, return_tensors="pt").to("cuda")
generate_kwargs = dict(
input_ids=inputs,
temperature=0.6,
max_new_tokens=500,
do_sample=True,
)
outputs = model.generate(**generate_kwargs)
print(tokenizer.decode(outputs[0]))
```