--- license: other --- The Moe model built on top of Qwen1.5-7B-Chat, Qwen1.5-7B and Crystalcareai/CrystalQwen-1.5-7B, Then qlora was applied to all layers of q,v, and gate linear on WizardLM_evol_instruct_70k via mlx. The model was created using a script from https://github.com/mzbac/mlx-moe ## Evaluation **Qwen-1_5-2x3-hf** *MMLU* | Groups |Version|Filter|n-shot|Metric|Value | |Stderr| |------------------|-------|------|-----:|------|-----:|---|-----:| | - humanities |N/A |none | 0|acc |0.6488|± |0.0237| | - other |N/A |none | 0|acc |0.6294|± |0.0302| | - social_sciences|N/A |none | 0|acc |0.6905|± |0.0281| | - stem |N/A |none | 0|acc |0.5227|± |0.0375| *CMMLU* |Groups|Version|Filter|n-shot| Metric |Value | |Stderr| |------|-------|------|-----:|--------|-----:|---|-----:| |cmmlu |N/A |none | 0|acc |0.6966|± |0.0333| | | |none | 0|acc_norm|0.6966|± |0.0333| *GSM8K* |Tasks|Version| Filter |n-shot| Metric |Value | |Stderr| |-----|------:|----------|-----:|-----------|-----:|---|-----:| |gsm8k| 2|get-answer| 5|exact_match|0.4102|± |0.0135| **Qwen1.5-7B-Chat** *MMLU* | Groups |Version|Filter|n-shot|Metric|Value | |Stderr| |------------------|-------|------|-----:|------|-----:|---|-----:| | - humanities |N/A |none | 0|acc |0.6533|± |0.0239| | - other |N/A |none | 0|acc |0.6321|± |0.0301| | - social_sciences|N/A |none | 0|acc |0.6934|± |0.0282| | - stem |N/A |none | 0|acc |0.5329|± |0.0376| *CMMLU* |Groups|Version|Filter|n-shot| Metric |Value | |Stderr| |------|-------|------|-----:|--------|-----:|---|-----:| |cmmlu |N/A |none | 0|acc |0.6879|± |0.0338| | | |none | 0|acc_norm|0.6879|± |0.0338| GSM8K |Tasks|Version| Filter |n-shot| Metric |Value | |Stderr| |-----|------:|----------|-----:|-----------|-----:|---|-----:| |gsm8k| 2|get-answer| 5|exact_match|0.0425|± |0.0056| ``` from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "mzbac/qwen-1.5-2x3-hf" model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, load_in_4bit=True, trust_remote_code=True, ) tokenizer = AutoTokenizer.from_pretrained(model_id) chat = [ {"role": "user", "content": "how backpropagation works?"}, {"role": "assistant", "content": "\n"}, ] text = tokenizer.apply_chat_template(chat, tokenize=False) inputs = tokenizer.encode(text, return_tensors="pt").to("cuda") generate_kwargs = dict( input_ids=inputs, temperature=0.6, max_new_tokens=500, do_sample=True, ) outputs = model.generate(**generate_kwargs) print(tokenizer.decode(outputs[0])) ```