Edit model card

AlphaMonarch-laser

image/jpeg

AlphaMonarch-laser is a new DPO merge using laserQLoRA that retains all the reasoning abilities of the very best merges and significantly improves its conversational abilities. Kind of the best of both worlds in a 7B model. This model uses mlabonne/NeuralMonarch-7B as its base model, finetuned on only half of the layers using laserQLoRA. The preference dataset used for DPO is mlabonne/chatml-OpenHermes2.5-dpo-binarized-alpha.


Evaluation data

Task Version Metric Value StdErr

agieval_aqua_rat 0 acc 28.35% 2.83% agieval_aqua_rat 0 acc_norm 26.38% 2.77% agieval_logiqa_en 0 acc 38.25% 1.91% agieval_logiqa_en 0 acc_norm 38.10% 1.90% agieval_lsat_ar 0 acc 23.91% 2.82% agieval_lsat_ar 0 acc_norm 23.48% 2.80% agieval_lsat_lr 0 acc 52.75% 2.21% agieval_lsat_lr 0 acc_norm 53.92% 2.21% agieval_lsat_rc 0 acc 66.91% 2.87% agieval_lsat_rc 0 acc_norm 67.29% 2.87% agieval_sat_en 0 acc 78.64% 2.86% agieval_sat_en 0 acc_norm 78.64% 2.86% agieval_sat_en_without_passage 0 acc 45.15% 3.48% agieval_sat_en_without_passage 0 acc_norm 44.17% 3.47% agieval_sat_math 0 acc 33.18% 3.18% agieval_sat_math 0 acc_norm 31.36% 3.14%

πŸ† Evaluation

Task Version Metric Value StdErr
agieval_aqua_rat 0 acc 28.35% 2.83%
agieval_aqua_rat 0 acc_norm 26.38% 2.77%
agieval_logiqa_en 0 acc 38.25% 1.91%
agieval_logiqa_en 0 acc_norm 38.10% 1.90%
agieval_lsat_ar 0 acc 23.91% 2.82%
agieval_lsat_ar 0 acc_norm 23.48% 2.80%
agieval_lsat_lr 0 acc 52.75% 2.21%
agieval_lsat_lr 0 acc_norm 53.92% 2.21%
agieval_lsat_rc 0 acc 66.91% 2.87%
agieval_lsat_rc 0 acc_norm 67.29% 2.87%
agieval_sat_en 0 acc 78.64% 2.86%
agieval_sat_en 0 acc_norm 78.64% 2.86%
agieval_sat_en_without_passage 0 acc 45.15% 3.48%
agieval_sat_en_without_passage 0 acc_norm 44.17% 3.47%
agieval_sat_math 0 acc 33.18% 3.18%
agieval_sat_math 0 acc_norm 31.36% 3.14%

Average: 75.9% without mmlu

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 63.03 Β± 1.68
mc2 78.39 Β± 1.37

BigBench Reasoning Test

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 60.00 _ 3.56
bigbench_date_understanding 0 multiple_choice_grade 62.06 _ 2.53
bigbench_disambiguation_qa 0 multiple_choice_grade 54.26 _ 3.11
bigbench_geometric_shapes 0 multiple_choice_grade 23.96 _ 2.26
... exact_str_match
bigbench_geometric_shapes 0 exact_str_match 0.00 _ 0.00
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 32.80 _ 2.10
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 23.86 _ 1.61
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 59.33 _ 2.84
bigbench_movie_recommendation 0 multiple_choice_grade 58.00 _ 2.21
bigbench_navigate 0 multiple_choice_grade 56.00 _ 1.57
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 69.20 _ 1.03
bigbench_ruin_names 0 multiple_choice_grade 55.36 _ 2.35
bigbench_salient_translation_error_detection 0 multiple_choice_grade 41.48 _ 1.56
bigbench_snarks 0 multiple_choice_grade 73.48 _ 3.29
bigbench_sports_understanding 0 multiple_choice_grade 76.06 _ 1.36
bigbench_temporal_sequences 0 multiple_choice_grade 55.50 _ 1.57
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 23.28 _ 1.20
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 19.37 _ 0.94
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 59.33 _ 2.84

Average: 49.08%

GPT4ALL

Task Version Metric Value Stderr
arc_challenge 0 acc 66.29 _ 1.38
acc_norm 68.26 _ 1.36
arc_easy 0 acc 86.57 _ 0.70
acc_norm 80.81 _ 0.81
boolq 1 acc 87.16 _ 0.59
hellaswag 0 acc 69.60 _ 0.46
acc_norm 87.45 _ 0.33
openbookqa 0 acc 39.20 _ 2.19
acc_norm 49.60 _ 2.24
piqa 0 acc 83.03 _ 0.88
acc_norm 84.87 _ 0.84
winogrande 0 acc 81.06 _ 1.10

Average: 68.75%

AGIEVAL

Here is the converted table in the required format, including multiplication of all values by 100 and calculating the average for the value column:

Task Version Metric Value StdErr
agieval_aqua_rat 0 acc 28.35 2.83
acc_norm 26.38 2.77
agieval_logiqa_en 0 acc 38.25 1.91
acc_norm 38.09 1.90
agieval_lsat_ar 0 acc 23.91 2.82
acc_norm 23.48 2.80
agieval_lsat_lr 0 acc 52.75 2.21
acc_norm 53.92 2.21
agieval_lsat_rc 0 acc 66.91 2.87
acc_norm 67.29 2.87
agieval_sat_en 0 acc 78.64 2.86
acc_norm 78.64 2.86
agieval_sat_en_without_passage 0 acc 45.15 3.48
acc_norm 44.17 3.47
agieval_sat_math 0 acc 33.18 3.18
acc_norm 31.36 3.14

Average: 47.44%

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1080

πŸ“ Axolotl Configuration

base_model: mlabonne/NeuralMonarch-7B
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
is_mistral_derived_model: true
load_in_8bit: false
load_in_4bit: true
strict: false
rl: dpo
chat_template: chatml
datasets:
  - path: mlabonne/chatml-OpenHermes2.5-dpo-binarized-alpha
    split: train
    type: chatml.intel
dataset_prepared_path:
val_set_size: 0.01
output_dir: ./out
adapter: qlora
lora_model_dir:
sequence_len: 1800
sample_packing: false
pad_to_sequence_len: false
lora_r: 16
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
 - layers.1.self_attn.q_proj
 - layers.0.self_attn.q_proj
 - layers.15.self_attn.q_proj
 - layers.12.self_attn.q_proj
 - layers.11.self_attn.q_proj
 - layers.14.self_attn.q_proj
 - layers.9.self_attn.q_proj
 - layers.16.self_attn.q_proj
 - layers.30.self_attn.q_proj
 - layers.18.self_attn.q_proj
 - layers.13.self_attn.q_proj
 - layers.10.self_attn.q_proj
 - layers.7.self_attn.q_proj
 - layers.8.self_attn.q_proj
 - layers.4.self_attn.q_proj
 - layers.19.self_attn.q_proj
 - layers.27.self_attn.k_proj
 - layers.24.self_attn.k_proj
 - layers.25.self_attn.k_proj
 - layers.22.self_attn.k_proj
 - layers.26.self_attn.k_proj
 - layers.29.self_attn.k_proj
 - layers.23.self_attn.k_proj
 - layers.28.self_attn.k_proj
 - layers.21.self_attn.k_proj
 - layers.31.self_attn.k_proj
 - layers.30.self_attn.k_proj
 - layers.20.self_attn.k_proj
 - layers.5.self_attn.k_proj
 - layers.19.self_attn.k_proj
 - layers.17.self_attn.k_proj
 - layers.18.self_attn.k_proj
 - layers.19.self_attn.v_proj
 - layers.24.self_attn.v_proj
 - layers.18.self_attn.v_proj
 - layers.5.self_attn.v_proj
 - layers.3.self_attn.v_proj
 - layers.16.self_attn.v_proj
 - layers.23.self_attn.v_proj
 - layers.27.self_attn.v_proj
 - layers.25.self_attn.v_proj
 - layers.26.self_attn.v_proj
 - layers.20.self_attn.v_proj
 - layers.6.self_attn.v_proj
 - layers.15.self_attn.v_proj
 - layers.17.self_attn.v_proj
 - layers.29.self_attn.v_proj
 - layers.22.self_attn.v_proj
 - layers.12.self_attn.o_proj
 - layers.9.self_attn.o_proj
 - layers.14.self_attn.o_proj
 - layers.0.self_attn.o_proj
 - layers.6.self_attn.o_proj
 - layers.8.self_attn.o_proj
 - layers.10.self_attn.o_proj
 - layers.11.self_attn.o_proj
 - layers.13.self_attn.o_proj
 - layers.24.self_attn.o_proj
 - layers.7.self_attn.o_proj
 - layers.15.self_attn.o_proj
 - layers.5.self_attn.o_proj
 - layers.17.self_attn.o_proj
 - layers.25.self_attn.o_proj
 - layers.4.self_attn.o_proj
 - layers.31.mlp.gate_proj
 - layers.30.mlp.gate_proj
 - layers.4.mlp.gate_proj
 - layers.3.mlp.gate_proj
 - layers.29.mlp.gate_proj
 - layers.28.mlp.gate_proj
 - layers.6.mlp.gate_proj
 - layers.27.mlp.gate_proj
 - layers.5.mlp.gate_proj
 - layers.26.mlp.gate_proj
 - layers.25.mlp.gate_proj
 - layers.7.mlp.gate_proj
 - layers.2.mlp.gate_proj
 - layers.24.mlp.gate_proj
 - layers.23.mlp.gate_proj
 - layers.10.mlp.gate_proj
 - layers.6.mlp.up_proj
 - layers.4.mlp.up_proj
 - layers.5.mlp.up_proj
 - layers.27.mlp.up_proj
 - layers.25.mlp.up_proj
 - layers.26.mlp.up_proj
 - layers.17.mlp.up_proj
 - layers.24.mlp.up_proj
 - layers.7.mlp.up_proj
 - layers.10.mlp.up_proj
 - layers.3.mlp.up_proj
 - layers.11.mlp.up_proj
 - layers.23.mlp.up_proj
 - layers.9.mlp.up_proj
 - layers.14.mlp.up_proj
 - layers.18.mlp.up_proj
 - layers.19.mlp.down_proj
 - layers.20.mlp.down_proj
 - layers.18.mlp.down_proj
 - layers.21.mlp.down_proj
 - layers.29.mlp.down_proj
 - layers.1.mlp.down_proj
 - layers.22.mlp.down_proj
 - layers.28.mlp.down_proj
 - layers.23.mlp.down_proj
 - layers.30.mlp.down_proj
 - layers.17.mlp.down_proj
 - layers.4.mlp.down_proj
 - layers.2.mlp.down_proj
 - layers.15.mlp.down_proj
 - layers.5.mlp.down_proj
wandb_project: axolotl
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 8
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 5e-7
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: true
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 100
evals_per_epoch: 1
eval_table_size:
eval_table_max_new_tokens: 128
save_steps: 1080
max_steps: 1080
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:

Framework versions

  • Transformers 4.38.0.dev0
  • Pytorch 2.1.2+cu118
  • Datasets 2.17.0
  • Tokenizers 0.15.0
  • axolotl: 0.4.0

Built with Axolotl

Downloads last month

-

Downloads are not tracked for this model. How to track
Safetensors
Model size
137M params
Tensor type
F32
Β·
U8
Β·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Dataset used to train maxrovalio/helloboi