The Moe model built on top of Qwen1.5-7B-Chat, Qwen1.5-7B and Crystalcareai/CrystalQwen-1.5-7B, Then qlora was applied to all layers of q,v, and gate linear on WizardLM_evol_instruct_70k via mlx. The model was created using a script from https://github.com/mzbac/mlx-moe
Evaluation
Qwen-1_5-2x3-hf
MMLU
Groups | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
- humanities | N/A | none | 0 | acc | 0.6488 | ± | 0.0237 |
- other | N/A | none | 0 | acc | 0.6294 | ± | 0.0302 |
- social_sciences | N/A | none | 0 | acc | 0.6905 | ± | 0.0281 |
- stem | N/A | none | 0 | acc | 0.5227 | ± | 0.0375 |
CMMLU
Groups | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
cmmlu | N/A | none | 0 | acc | 0.6966 | ± | 0.0333 |
none | 0 | acc_norm | 0.6966 | ± | 0.0333 |
GSM8K
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
gsm8k | 2 | get-answer | 5 | exact_match | 0.4102 | ± | 0.0135 |
Qwen1.5-7B-Chat
MMLU
Groups | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
- humanities | N/A | none | 0 | acc | 0.6533 | ± | 0.0239 |
- other | N/A | none | 0 | acc | 0.6321 | ± | 0.0301 |
- social_sciences | N/A | none | 0 | acc | 0.6934 | ± | 0.0282 |
- stem | N/A | none | 0 | acc | 0.5329 | ± | 0.0376 |
CMMLU
Groups | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
cmmlu | N/A | none | 0 | acc | 0.6879 | ± | 0.0338 |
none | 0 | acc_norm | 0.6879 | ± | 0.0338 |
GSM8K
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | |
---|---|---|---|---|---|---|---|
gsm8k | 2 | get-answer | 5 | exact_match | 0.0425 | ± | 0.0056 |
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "mzbac/qwen-1.5-2x3-hf"
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
load_in_4bit=True,
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
chat = [
{"role": "user", "content": "how backpropagation works?"},
{"role": "assistant", "content": "\n"},
]
text = tokenizer.apply_chat_template(chat, tokenize=False)
inputs = tokenizer.encode(text, return_tensors="pt").to("cuda")
generate_kwargs = dict(
input_ids=inputs,
temperature=0.6,
max_new_tokens=500,
do_sample=True,
)
outputs = model.generate(**generate_kwargs)
print(tokenizer.decode(outputs[0]))
- Downloads last month
- 0
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.