Edit model card

Mistral2_1000_STEPS_01beta_1e8rate_CDPOSFT

This model is a fine-tuned version of tsavage68/mistralit2_1000_STEPS_5e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6926
  • Rewards/chosen: 0.0065
  • Rewards/rejected: 0.0053
  • Rewards/accuracies: 0.4615
  • Rewards/margins: 0.0012
  • Logps/rejected: -26.5038
  • Logps/chosen: -23.6067
  • Logits/rejected: -2.3100
  • Logits/chosen: -2.3095

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-08
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6942 0.0977 50 0.6931 0.0016 0.0015 0.4549 0.0001 -26.5417 -23.6558 -2.3103 -2.3098
0.6924 0.1953 100 0.6933 0.0004 0.0006 0.4352 -0.0002 -26.5508 -23.6681 -2.3107 -2.3103
0.6936 0.2930 150 0.6931 0.0013 0.0013 0.4527 0.0001 -26.5442 -23.6585 -2.3106 -2.3102
0.6934 0.3906 200 0.6925 0.0034 0.0021 0.4791 0.0013 -26.5358 -23.6374 -2.3104 -2.3099
0.6923 0.4883 250 0.6928 0.0053 0.0044 0.4967 0.0008 -26.5125 -23.6191 -2.3102 -2.3098
0.6914 0.5859 300 0.6924 0.0058 0.0043 0.4879 0.0015 -26.5142 -23.6138 -2.3102 -2.3098
0.6922 0.6836 350 0.6926 0.0072 0.0059 0.4923 0.0012 -26.4974 -23.6001 -2.3104 -2.3099
0.6913 0.7812 400 0.6924 0.0048 0.0034 0.4945 0.0015 -26.5233 -23.6235 -2.3098 -2.3094
0.6917 0.8789 450 0.6923 0.0058 0.0041 0.5011 0.0017 -26.5157 -23.6136 -2.3100 -2.3096
0.6909 0.9766 500 0.6925 0.0052 0.0038 0.4813 0.0014 -26.5186 -23.6196 -2.3101 -2.3097
0.6906 1.0742 550 0.6925 0.0073 0.0059 0.4989 0.0013 -26.4974 -23.5988 -2.3100 -2.3096
0.692 1.1719 600 0.6925 0.0063 0.0049 0.5033 0.0014 -26.5080 -23.6092 -2.3099 -2.3095
0.6918 1.2695 650 0.6924 0.0055 0.0041 0.4857 0.0015 -26.5160 -23.6163 -2.3099 -2.3095
0.6918 1.3672 700 0.6923 0.0066 0.0048 0.5165 0.0018 -26.5093 -23.6059 -2.3100 -2.3096
0.6915 1.4648 750 0.6921 0.0078 0.0057 0.5121 0.0022 -26.5002 -23.5933 -2.3100 -2.3096
0.6917 1.5625 800 0.6923 0.0070 0.0053 0.4901 0.0017 -26.5038 -23.6016 -2.3099 -2.3095
0.692 1.6602 850 0.6926 0.0068 0.0057 0.4813 0.0012 -26.5000 -23.6033 -2.3099 -2.3094
0.6913 1.7578 900 0.6926 0.0065 0.0053 0.4615 0.0012 -26.5038 -23.6067 -2.3100 -2.3095
0.6917 1.8555 950 0.6926 0.0065 0.0053 0.4615 0.0012 -26.5038 -23.6067 -2.3100 -2.3095
0.6911 1.9531 1000 0.6926 0.0065 0.0053 0.4615 0.0012 -26.5038 -23.6067 -2.3100 -2.3095

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
7.24B params
Tensor type
FP16
·

Finetuned from