Edit model card

Mistral2_1000_STEPS_03beta_1e7rate_CDPOSFT

This model is a fine-tuned version of tsavage68/mistralit2_1000_STEPS_5e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5990
  • Rewards/chosen: 0.5762
  • Rewards/rejected: -0.8738
  • Rewards/accuracies: 0.6505
  • Rewards/margins: 1.4500
  • Logps/rejected: -29.4696
  • Logps/chosen: -21.7510
  • Logits/rejected: -2.1561
  • Logits/chosen: -2.1557

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6838 0.0977 50 0.6757 0.0819 0.0446 0.6088 0.0374 -26.4083 -23.3987 -2.3067 -2.3063
0.5869 0.1953 100 0.5936 0.1585 -0.1222 0.6418 0.2808 -26.9643 -23.1432 -2.2711 -2.2707
0.4715 0.2930 150 0.5452 -0.2129 -0.8058 0.6659 0.5930 -29.2430 -24.3812 -2.2397 -2.2393
0.354 0.3906 200 0.5529 1.0155 0.1855 0.6549 0.8300 -25.9386 -20.2868 -2.2199 -2.2195
0.4396 0.4883 250 0.5574 1.1590 0.1518 0.6462 1.0072 -26.0510 -19.8085 -2.2035 -2.2031
0.3274 0.5859 300 0.5545 1.1199 0.0715 0.6593 1.0484 -26.3185 -19.9386 -2.2082 -2.2078
0.4225 0.6836 350 0.5761 0.8487 -0.3483 0.6440 1.1970 -27.7178 -20.8428 -2.1904 -2.1900
0.438 0.7812 400 0.5743 0.8375 -0.4076 0.6505 1.2451 -27.9155 -20.8801 -2.1868 -2.1864
0.4097 0.8789 450 0.5715 0.9972 -0.2262 0.6593 1.2234 -27.3110 -20.3477 -2.1789 -2.1785
0.3681 0.9766 500 0.5530 1.3124 0.1000 0.6637 1.2124 -26.2237 -19.2971 -2.1811 -2.1807
0.2244 1.0742 550 0.5675 1.0929 -0.2118 0.6549 1.3047 -27.2629 -20.0288 -2.1714 -2.1710
0.1844 1.1719 600 0.5865 0.7455 -0.6438 0.6484 1.3894 -28.7029 -21.1865 -2.1633 -2.1629
0.3499 1.2695 650 0.5943 0.6716 -0.7550 0.6484 1.4266 -29.0734 -21.4330 -2.1596 -2.1592
0.2335 1.3672 700 0.5946 0.6222 -0.8092 0.6440 1.4314 -29.2540 -21.5976 -2.1580 -2.1576
0.1899 1.4648 750 0.5962 0.5886 -0.8572 0.6484 1.4459 -29.4143 -21.7096 -2.1567 -2.1563
0.319 1.5625 800 0.5973 0.5755 -0.8764 0.6440 1.4519 -29.4783 -21.7533 -2.1565 -2.1561
0.2466 1.6602 850 0.5971 0.5726 -0.8773 0.6484 1.4499 -29.4812 -21.7631 -2.1562 -2.1558
0.2674 1.7578 900 0.5953 0.5773 -0.8785 0.6462 1.4559 -29.4853 -21.7472 -2.1565 -2.1560
0.2268 1.8555 950 0.5990 0.5769 -0.8744 0.6462 1.4514 -29.4716 -21.7486 -2.1562 -2.1558
0.235 1.9531 1000 0.5990 0.5762 -0.8738 0.6505 1.4500 -29.4696 -21.7510 -2.1561 -2.1557

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
7.24B params
Tensor type
FP16
·

Finetuned from