Edit model card

Mistral2_1000_STEPS_05beta_CDPOSFT

This model is a fine-tuned version of tsavage68/mistralit2_1000_STEPS_5e7_SFT on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5804
  • Rewards/chosen: 1.1615
  • Rewards/rejected: 0.9028
  • Rewards/accuracies: 0.4286
  • Rewards/margins: 0.2587
  • Logps/rejected: -75.7158
  • Logps/chosen: -73.1790
  • Logits/rejected: -1.8951
  • Logits/chosen: -1.8951

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
2.0435 0.0977 50 1.6546 -0.2111 -0.0540 0.3868 -0.1572 -77.6294 -75.9242 -1.6195 -1.6195
3.1098 0.1953 100 1.7215 -0.1670 -0.5209 0.4286 0.3539 -78.5632 -75.8359 -0.7416 -0.7416
1.8949 0.2930 150 1.6841 2.1671 1.9786 0.4154 0.1886 -73.5644 -71.1677 -1.7619 -1.7619
1.4406 0.3906 200 1.6936 2.3177 2.1054 0.4264 0.2124 -73.3108 -70.8665 -2.2879 -2.2879
1.5623 0.4883 250 1.5911 0.8418 0.4811 0.4396 0.3607 -76.5593 -73.8184 -1.5834 -1.5834
1.8884 0.5859 300 1.5747 1.4552 1.2105 0.4418 0.2447 -75.1005 -72.5916 -1.6640 -1.6640
1.4373 0.6836 350 1.5569 1.3020 1.0909 0.4198 0.2111 -75.3397 -72.8979 -1.9137 -1.9136
1.4732 0.7812 400 1.5216 1.0023 0.6676 0.4571 0.3347 -76.1863 -73.4973 -1.9794 -1.9794
1.9109 0.8789 450 1.5502 1.3520 0.9986 0.4505 0.3534 -75.5243 -72.7979 -1.8076 -1.8076
1.4744 0.9766 500 1.5531 1.3605 1.1014 0.4264 0.2591 -75.3186 -72.7809 -1.9385 -1.9385
1.2615 1.0742 550 1.6623 0.6530 0.4114 0.4242 0.2415 -76.6986 -74.1960 -2.3949 -2.3949
1.8019 1.1719 600 1.6240 0.8707 0.6200 0.4308 0.2507 -76.2815 -73.7606 -1.6149 -1.6149
1.2202 1.2695 650 1.5993 1.1246 0.9014 0.4330 0.2233 -75.7188 -73.2527 -1.8964 -1.8964
1.0924 1.3672 700 1.5922 1.3888 1.1674 0.4242 0.2214 -75.1866 -72.7243 -1.8455 -1.8455
0.8059 1.4648 750 1.6004 1.1205 0.8834 0.4396 0.2371 -75.7547 -73.2610 -1.9415 -1.9415
0.9489 1.5625 800 1.5917 1.2725 1.0232 0.4264 0.2493 -75.4751 -72.9570 -1.9293 -1.9293
1.2564 1.6602 850 1.5797 1.1856 0.9286 0.4264 0.2570 -75.6643 -73.1308 -1.8894 -1.8894
1.2613 1.7578 900 1.5806 1.1682 0.9110 0.4308 0.2572 -75.6995 -73.1655 -1.8963 -1.8963
1.1197 1.8555 950 1.5804 1.1615 0.9030 0.4286 0.2585 -75.7156 -73.1791 -1.8955 -1.8955
0.7665 1.9531 1000 1.5804 1.1615 0.9028 0.4286 0.2587 -75.7158 -73.1790 -1.8951 -1.8951

Framework versions

  • Transformers 4.40.1
  • Pytorch 2.0.0+cu117
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
1
Safetensors
Model size
7.24B params
Tensor type
FP16
·

Finetuned from