---
license: other
---
The Moe model built on top of Qwen1.5-7B-Chat, Qwen1.5-7B and Crystalcareai/CrystalQwen-1.5-7B, Then qlora was applied to all layers of q,v, and gate linear on WizardLM_evol_instruct_70k via mlx.
The model was created using a script from https://github.com/mzbac/mlx-moe

## Evaluation

**Qwen-1_5-2x3-hf**

*MMLU*

|      Groups      |Version|Filter|n-shot|Metric|Value |   |Stderr|
|------------------|-------|------|-----:|------|-----:|---|-----:|
| - humanities     |N/A    |none  |     0|acc   |0.6488|±  |0.0237|
| - other          |N/A    |none  |     0|acc   |0.6294|±  |0.0302|
| - social_sciences|N/A    |none  |     0|acc   |0.6905|±  |0.0281|
| - stem           |N/A    |none  |     0|acc   |0.5227|±  |0.0375|

*CMMLU*

|Groups|Version|Filter|n-shot| Metric |Value |   |Stderr|
|------|-------|------|-----:|--------|-----:|---|-----:|
|cmmlu |N/A    |none  |     0|acc     |0.6966|±  |0.0333|
|      |       |none  |     0|acc_norm|0.6966|±  |0.0333|

*GSM8K*

|Tasks|Version|  Filter  |n-shot|  Metric   |Value |   |Stderr|
|-----|------:|----------|-----:|-----------|-----:|---|-----:|
|gsm8k|      2|get-answer|     5|exact_match|0.4102|±  |0.0135|

**Qwen1.5-7B-Chat**

*MMLU*

|      Groups      |Version|Filter|n-shot|Metric|Value |   |Stderr|
|------------------|-------|------|-----:|------|-----:|---|-----:|
| - humanities     |N/A    |none  |     0|acc   |0.6533|±  |0.0239|
| - other          |N/A    |none  |     0|acc   |0.6321|±  |0.0301|
| - social_sciences|N/A    |none  |     0|acc   |0.6934|±  |0.0282|
| - stem           |N/A    |none  |     0|acc   |0.5329|±  |0.0376|

*CMMLU*

|Groups|Version|Filter|n-shot| Metric |Value |   |Stderr|
|------|-------|------|-----:|--------|-----:|---|-----:|
|cmmlu |N/A    |none  |     0|acc     |0.6879|±  |0.0338|
|      |       |none  |     0|acc_norm|0.6879|±  |0.0338|

GSM8K

|Tasks|Version|  Filter  |n-shot|  Metric   |Value |   |Stderr|
|-----|------:|----------|-----:|-----------|-----:|---|-----:|
|gsm8k|      2|get-answer|     5|exact_match|0.0425|±  |0.0056|

```
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "mzbac/qwen-1.5-2x3-hf"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    load_in_4bit=True,
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(model_id)

chat = [
    {"role": "user", "content": "how backpropagation works?"},
    {"role": "assistant", "content": "\n"},
]

text = tokenizer.apply_chat_template(chat, tokenize=False)

inputs = tokenizer.encode(text, return_tensors="pt").to("cuda")

generate_kwargs = dict(
    input_ids=inputs,
    temperature=0.6,
    max_new_tokens=500,
    do_sample=True,
)

outputs = model.generate(**generate_kwargs)
print(tokenizer.decode(outputs[0]))
```