Edit model card

Mistral2_1000_STEPS_03beta_5e7rate_CDPOSFT

This model is a fine-tuned version of tsavage68/mistralit2_1000_STEPS_5e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6902
  • Rewards/chosen: 0.0173
  • Rewards/rejected: 0.0111
  • Rewards/accuracies: 0.5099
  • Rewards/margins: 0.0062
  • Logps/rejected: -26.5199
  • Logps/chosen: -23.6139
  • Logits/rejected: -2.3100
  • Logits/chosen: -2.3096

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-08
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6937 0.0977 50 0.6937 0.0063 0.0071 0.4681 -0.0008 -26.5333 -23.6507 -2.3103 -2.3099
0.6931 0.1953 100 0.6934 0.0068 0.0069 0.4308 -0.0002 -26.5337 -23.6491 -2.3103 -2.3099
0.6945 0.2930 150 0.6927 0.0085 0.0074 0.4527 0.0011 -26.5323 -23.6433 -2.3104 -2.3100
0.6954 0.3906 200 0.6930 0.0105 0.0098 0.4725 0.0006 -26.5240 -23.6368 -2.3103 -2.3099
0.6911 0.4883 250 0.6926 0.0138 0.0124 0.4615 0.0014 -26.5154 -23.6257 -2.3102 -2.3098
0.6878 0.5859 300 0.6919 0.0233 0.0205 0.4681 0.0028 -26.4886 -23.5941 -2.3100 -2.3096
0.6899 0.6836 350 0.6910 0.0119 0.0072 0.5055 0.0047 -26.5329 -23.6321 -2.3097 -2.3093
0.6886 0.7812 400 0.6907 0.0202 0.0149 0.4989 0.0053 -26.5071 -23.6043 -2.3100 -2.3096
0.6927 0.8789 450 0.6915 0.0216 0.0180 0.4725 0.0036 -26.4968 -23.5995 -2.3100 -2.3096
0.6886 0.9766 500 0.6917 0.0198 0.0166 0.4571 0.0032 -26.5016 -23.6056 -2.3102 -2.3097
0.6868 1.0742 550 0.6916 0.0203 0.0167 0.4945 0.0036 -26.5011 -23.6041 -2.3097 -2.3093
0.6862 1.1719 600 0.6911 0.0198 0.0153 0.5033 0.0045 -26.5058 -23.6057 -2.3099 -2.3095
0.6869 1.2695 650 0.6913 0.0210 0.0171 0.5077 0.0039 -26.5000 -23.6017 -2.3098 -2.3094
0.6921 1.3672 700 0.6911 0.0221 0.0177 0.4879 0.0044 -26.4979 -23.5979 -2.3104 -2.3099
0.6883 1.4648 750 0.6916 0.0223 0.0187 0.4791 0.0035 -26.4944 -23.5974 -2.3102 -2.3098
0.6883 1.5625 800 0.6904 0.0184 0.0125 0.5011 0.0059 -26.5152 -23.6104 -2.3100 -2.3096
0.6876 1.6602 850 0.6904 0.0181 0.0123 0.5055 0.0059 -26.5159 -23.6112 -2.3100 -2.3096
0.6845 1.7578 900 0.6902 0.0173 0.0111 0.5099 0.0062 -26.5199 -23.6139 -2.3100 -2.3096
0.6892 1.8555 950 0.6902 0.0173 0.0111 0.5099 0.0062 -26.5199 -23.6139 -2.3100 -2.3096
0.6878 1.9531 1000 0.6902 0.0173 0.0111 0.5099 0.0062 -26.5199 -23.6139 -2.3100 -2.3096

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
7.24B params
Tensor type
FP16
·

Finetuned from