Edit model card

Mistral2_1000_STEPS_05beta_1e7rate_CDPOSFT

This model is a fine-tuned version of tsavage68/mistralit2_1000_STEPS_5e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3136
  • Rewards/chosen: -2.3904
  • Rewards/rejected: -7.1332
  • Rewards/accuracies: 0.6593
  • Rewards/margins: 4.7427
  • Logps/rejected: -40.8232
  • Logps/chosen: -28.4526
  • Logits/rejected: -1.9252
  • Logits/chosen: -1.9252

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5792 0.0977 50 0.5909 -2.0186 -2.6003 0.6220 0.5817 -31.7574 -27.7089 -2.2531 -2.2527
0.6303 0.1953 100 0.6667 3.3013 1.7992 0.6549 1.5021 -22.9585 -17.0691 -2.2458 -2.2455
0.5825 0.2930 150 0.7920 2.4778 0.7375 0.6505 1.7403 -25.0818 -18.7161 -2.1900 -2.1897
0.6114 0.3906 200 0.7379 2.2312 0.7238 0.6659 1.5074 -25.1093 -19.2092 -2.3138 -2.3135
0.6456 0.4883 250 0.8073 3.3802 1.8907 0.6220 1.4894 -22.7754 -16.9114 -2.1555 -2.1552
0.6342 0.5859 300 0.8059 3.1536 1.5241 0.6286 1.6295 -23.5086 -17.3644 -2.2658 -2.2655
0.6242 0.6836 350 0.8249 1.4081 -0.6396 0.6659 2.0477 -27.8361 -20.8555 -2.2305 -2.2303
0.7214 0.7812 400 0.8283 2.4761 0.6640 0.6418 1.8121 -25.2289 -18.7195 -2.3316 -2.3314
0.7045 0.8789 450 0.8201 1.8174 -0.1276 0.6352 1.9451 -26.8121 -20.0369 -2.1939 -2.1937
0.479 0.9766 500 0.7489 2.6325 1.0003 0.6593 1.6322 -24.5563 -18.4067 -2.3131 -2.3129
0.0869 1.0742 550 0.9388 0.3435 -2.9890 0.6681 3.3325 -32.5349 -22.9847 -2.0092 -2.0092
0.2298 1.1719 600 1.1052 -0.7335 -4.5697 0.6593 3.8362 -35.6963 -25.1386 -1.9647 -1.9647
0.2182 1.2695 650 1.2321 -1.9830 -6.2540 0.6593 4.2711 -39.0649 -27.6376 -1.9426 -1.9426
0.0774 1.3672 700 1.2775 -2.3773 -6.9288 0.6615 4.5515 -40.4144 -28.4262 -1.9328 -1.9328
0.1026 1.4648 750 1.3159 -2.4992 -7.2166 0.6615 4.7174 -40.9900 -28.6701 -1.9244 -1.9244
0.0987 1.5625 800 1.3118 -2.4534 -7.2109 0.6593 4.7575 -40.9786 -28.5784 -1.9248 -1.9248
0.2393 1.6602 850 1.3108 -2.3855 -7.1139 0.6637 4.7283 -40.7846 -28.4428 -1.9255 -1.9255
0.2495 1.7578 900 1.3100 -2.3926 -7.1330 0.6637 4.7404 -40.8229 -28.4569 -1.9264 -1.9264
0.1851 1.8555 950 1.3120 -2.4001 -7.1405 0.6637 4.7404 -40.8378 -28.4718 -1.9253 -1.9253
0.0934 1.9531 1000 1.3136 -2.3904 -7.1332 0.6593 4.7427 -40.8232 -28.4526 -1.9252 -1.9252

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
7.24B params
Tensor type
FP16
·

Finetuned from