Edit model card

Mistral2_1000_STEPS_01beta_1e7rate_CDPOSFT

This model is a fine-tuned version of tsavage68/mistralit2_1000_STEPS_5e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5644
  • Rewards/chosen: -0.8921
  • Rewards/rejected: -1.9046
  • Rewards/accuracies: 0.6330
  • Rewards/margins: 1.0126
  • Logps/rejected: -45.6030
  • Logps/chosen: -32.5922
  • Logits/rejected: -2.0662
  • Logits/chosen: -2.0658

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6891 0.0977 50 0.6864 0.0309 0.0171 0.6220 0.0138 -26.3858 -23.3627 -2.3067 -2.3063
0.6478 0.1953 100 0.6468 0.0940 -0.0114 0.6418 0.1054 -26.6706 -22.7312 -2.2655 -2.2651
0.5634 0.2930 150 0.5932 0.1716 -0.1066 0.6593 0.2783 -27.6232 -21.9553 -2.2109 -2.2105
0.4312 0.3906 200 0.5617 -0.0095 -0.5043 0.6396 0.4948 -31.6002 -23.7667 -2.1590 -2.1586
0.4711 0.4883 250 0.5499 -0.0116 -0.6823 0.6462 0.6707 -33.3801 -23.7878 -2.1283 -2.1279
0.4014 0.5859 300 0.5465 -0.4289 -1.1546 0.6484 0.7257 -38.1031 -27.9606 -2.1330 -2.1326
0.4439 0.6836 350 0.5634 -0.7104 -1.5436 0.6462 0.8331 -41.9925 -30.7762 -2.1022 -2.1018
0.4768 0.7812 400 0.5594 -0.6950 -1.5434 0.6571 0.8484 -41.9907 -30.6215 -2.1034 -2.1030
0.4891 0.8789 450 0.5525 -0.7222 -1.5890 0.6505 0.8668 -42.4472 -30.8936 -2.0946 -2.0942
0.4048 0.9766 500 0.5463 -0.4609 -1.3226 0.6571 0.8617 -39.7828 -28.2802 -2.1066 -2.1062
0.3051 1.0742 550 0.5533 -0.7106 -1.6492 0.6440 0.9386 -43.0491 -30.7781 -2.0836 -2.0832
0.3145 1.1719 600 0.5586 -0.8155 -1.7726 0.6330 0.9571 -44.2825 -31.8265 -2.0777 -2.0774
0.4126 1.2695 650 0.5618 -0.8660 -1.8549 0.6374 0.9889 -45.1055 -32.3315 -2.0720 -2.0716
0.3106 1.3672 700 0.5631 -0.8991 -1.8960 0.6308 0.9969 -45.5172 -32.6628 -2.0686 -2.0682
0.3095 1.4648 750 0.5638 -0.8960 -1.9056 0.6308 1.0096 -45.6128 -32.6320 -2.0670 -2.0666
0.3638 1.5625 800 0.5660 -0.8946 -1.9044 0.6352 1.0098 -45.6007 -32.6176 -2.0663 -2.0659
0.348 1.6602 850 0.5645 -0.8960 -1.9094 0.6374 1.0134 -45.6511 -32.6320 -2.0665 -2.0661
0.3272 1.7578 900 0.5653 -0.8971 -1.9081 0.6352 1.0110 -45.6377 -32.6428 -2.0662 -2.0658
0.3261 1.8555 950 0.5644 -0.8920 -1.9045 0.6374 1.0124 -45.6014 -32.5920 -2.0662 -2.0659
0.2913 1.9531 1000 0.5644 -0.8921 -1.9046 0.6330 1.0126 -45.6030 -32.5922 -2.0662 -2.0658

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
7.24B params
Tensor type
FP16
·

Finetuned from