Edit model card

PE_Llama_2_7b_sft_rlhf

This model was trained from scratch on the arrow dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0093
  • Rewards/chosen: -7.0331
  • Rewards/rejected: -29.3861
  • Rewards/accuracies: 0.9916
  • Rewards/margins: 22.3530
  • Logps/rejected: -118.6765
  • Logps/chosen: -90.0482
  • Logits/rejected: -1.3495
  • Logits/chosen: -1.4301

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-07
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5577 0.05 100 0.5743 -0.0890 -0.3528 0.9022 0.2638 -60.6098 -76.1599 -1.3076 -1.3716
0.1502 0.09 200 0.1761 -0.5864 -2.4951 0.9804 1.9086 -64.8944 -77.1548 -1.3397 -1.4091
0.0367 0.14 300 0.0640 -1.1815 -4.8466 0.9860 3.6651 -69.5975 -78.3450 -1.3685 -1.4428
0.0195 0.18 400 0.0419 -1.6306 -6.4153 0.9832 4.7847 -72.7348 -79.2431 -1.3875 -1.4648
0.0128 0.23 500 0.0321 -2.1351 -8.0395 0.9860 5.9044 -75.9833 -80.2522 -1.4045 -1.4847
0.0078 0.27 600 0.0294 -2.8235 -9.6992 0.9860 6.8757 -79.3027 -81.6291 -1.4163 -1.4986
0.0074 0.32 700 0.0177 -2.7718 -10.7772 0.9832 8.0054 -81.4587 -81.5256 -1.4251 -1.5079
0.0051 0.37 800 0.0144 -2.4805 -11.3179 0.9832 8.8374 -82.5400 -80.9429 -1.4353 -1.5181
0.003 0.41 900 0.0160 -2.8352 -12.2817 0.9860 9.4465 -84.4677 -81.6525 -1.4421 -1.5261
0.0031 0.46 1000 0.0122 -2.8873 -13.0359 0.9860 10.1487 -85.9761 -81.7565 -1.4514 -1.5345
0.0107 0.5 1100 0.0110 -2.8383 -13.0784 0.9888 10.2401 -86.0611 -81.6586 -1.4506 -1.5334
0.0065 0.55 1200 0.0130 -3.3682 -13.9857 0.9860 10.6176 -87.8757 -82.7184 -1.4603 -1.5441
0.0054 0.59 1300 0.0123 -3.6048 -14.8999 0.9888 11.2951 -89.7041 -83.1916 -1.4576 -1.5403
0.0048 0.64 1400 0.0091 -3.3176 -15.0505 0.9860 11.7329 -90.0053 -82.6172 -1.4598 -1.5418
0.0017 0.68 1500 0.0087 -3.3081 -15.5642 0.9860 12.2561 -91.0327 -82.5982 -1.4671 -1.5494
0.0042 0.73 1600 0.0091 -3.5315 -16.2814 0.9860 12.7498 -92.4670 -83.0451 -1.4722 -1.5560
0.0035 0.78 1700 0.0078 -3.1483 -15.9040 0.9916 12.7557 -91.7122 -82.2786 -1.4664 -1.5481
0.0094 0.82 1800 0.0071 -2.9923 -15.9175 0.9888 12.9251 -91.7391 -81.9667 -1.4572 -1.5390
0.0024 0.87 1900 0.0066 -2.9861 -16.5288 0.9916 13.5427 -92.9619 -81.9542 -1.4690 -1.5511
0.0067 0.91 2000 0.0076 -3.2851 -16.0301 0.9916 12.7450 -91.9644 -82.5522 -1.4577 -1.5391
0.0044 0.96 2100 0.0064 -3.3414 -16.8752 0.9944 13.5338 -93.6545 -82.6647 -1.4617 -1.5440
0.0025 1.0 2200 0.0060 -3.1967 -16.8252 0.9944 13.6285 -93.5546 -82.3753 -1.4630 -1.5444
0.0023 1.05 2300 0.0063 -3.5595 -17.6105 0.9916 14.0510 -95.1253 -83.1011 -1.4645 -1.5467
0.0055 1.1 2400 0.0070 -4.0460 -18.6662 0.9944 14.6201 -97.2365 -84.0740 -1.4606 -1.5441
0.0052 1.14 2500 0.0067 -3.3185 -17.6030 0.9944 14.2844 -95.1102 -82.6191 -1.4679 -1.5507
0.0023 1.19 2600 0.0064 -3.4071 -18.2406 0.9944 14.8335 -96.3854 -82.7962 -1.4667 -1.5501
0.0044 1.23 2700 0.0090 -4.3343 -19.6985 0.9916 15.3642 -99.3012 -84.6506 -1.4647 -1.5496
0.0033 1.28 2800 0.0113 -4.6406 -19.7381 0.9916 15.0976 -99.3805 -85.2631 -1.4569 -1.5408
0.0023 1.32 2900 0.0070 -3.9341 -19.4138 0.9944 15.4797 -98.7318 -83.8501 -1.4612 -1.5449
0.0034 1.37 3000 0.0066 -3.7082 -18.5209 0.9916 14.8127 -96.9460 -83.3983 -1.4587 -1.5399
0.0033 1.42 3100 0.0064 -3.6694 -18.6338 0.9972 14.9644 -97.1717 -83.3208 -1.4480 -1.5297
0.0034 1.46 3200 0.0059 -3.7376 -19.1673 0.9944 15.4298 -98.2389 -83.4571 -1.4483 -1.5307
0.0019 1.51 3300 0.0061 -3.9735 -19.7068 0.9916 15.7332 -99.3178 -83.9291 -1.4459 -1.5285
0.0011 1.55 3400 0.0066 -4.3242 -20.4806 0.9944 16.1564 -100.8654 -84.6304 -1.4412 -1.5245
0.0001 1.6 3500 0.0093 -4.7847 -21.0204 0.9916 16.2357 -101.9450 -85.5513 -1.4308 -1.5145
0.0037 1.64 3600 0.0076 -4.5704 -20.9595 0.9888 16.3891 -101.8232 -85.1228 -1.4373 -1.5209
0.003 1.69 3700 0.0087 -4.7965 -21.6522 0.9916 16.8557 -103.2086 -85.5750 -1.4300 -1.5148
0.0056 1.73 3800 0.0093 -5.1262 -22.2592 0.9916 17.1330 -104.4226 -86.2344 -1.4213 -1.5058
0.0024 1.78 3900 0.0113 -5.8601 -23.7638 0.9888 17.9037 -107.4319 -87.7022 -1.4014 -1.4856
0.0034 1.83 4000 0.0056 -4.7077 -22.5264 0.9944 17.8187 -104.9570 -85.3974 -1.4252 -1.5084
0.0044 1.87 4100 0.0055 -4.2834 -21.6926 0.9972 17.4092 -103.2894 -84.5488 -1.4342 -1.5165
0.0001 1.92 4200 0.0068 -5.2542 -23.4097 0.9916 18.1555 -106.7237 -86.4905 -1.4219 -1.5052
0.0044 1.96 4300 0.0075 -5.2492 -23.2824 0.9888 18.0332 -106.4690 -86.4804 -1.4098 -1.4921
0.0022 2.01 4400 0.0082 -5.6200 -23.9342 0.9944 18.3142 -107.7725 -87.2220 -1.4087 -1.4906
0.0033 2.05 4500 0.0091 -5.9484 -24.5607 0.9916 18.6123 -109.0256 -87.8787 -1.4036 -1.4857
0.0022 2.1 4600 0.0091 -6.0570 -25.0424 0.9916 18.9853 -109.9890 -88.0961 -1.3980 -1.4804
0.0011 2.15 4700 0.0100 -6.3832 -25.6097 0.9888 19.2265 -111.1236 -88.7484 -1.3907 -1.4732
0.0065 2.19 4800 0.0073 -5.7898 -25.1360 0.9916 19.3462 -110.1763 -87.5616 -1.4006 -1.4827
0.0022 2.24 4900 0.0091 -6.1379 -25.9334 0.9916 19.7955 -111.7710 -88.2578 -1.3907 -1.4732
0.0022 2.28 5000 0.0147 -7.3728 -27.6080 0.9888 20.2352 -115.1203 -90.7277 -1.3738 -1.4564
0.0033 2.33 5100 0.0120 -6.9056 -27.3057 0.9888 20.4002 -114.5157 -89.7931 -1.3780 -1.4604
0.0043 2.37 5200 0.0097 -6.5949 -27.6154 0.9888 21.0205 -115.1350 -89.1717 -1.3772 -1.4593
0.0022 2.42 5300 0.0152 -7.5122 -28.6578 0.9888 21.1456 -117.2199 -91.0065 -1.3647 -1.4465
0.0022 2.46 5400 0.0149 -7.7072 -29.4467 0.9888 21.7395 -118.7977 -91.3965 -1.3515 -1.4331
0.0001 2.51 5500 0.0137 -7.6730 -29.4473 0.9916 21.7743 -118.7989 -91.3281 -1.3483 -1.4293
0.0022 2.56 5600 0.0133 -7.6989 -29.6686 0.9916 21.9697 -119.2415 -91.3798 -1.3485 -1.4299
0.0011 2.6 5700 0.0095 -6.8592 -28.9672 0.9888 22.1080 -117.8385 -89.7003 -1.3553 -1.4366
0.0054 2.65 5800 0.0077 -6.4136 -28.4244 0.9916 22.0108 -116.7531 -88.8093 -1.3637 -1.4450
0.0033 2.69 5900 0.0115 -7.6490 -30.1521 0.9888 22.5031 -120.2085 -91.2800 -1.3400 -1.4208
0.0011 2.74 6000 0.0086 -6.8537 -29.1407 0.9888 22.2870 -118.1857 -89.6894 -1.3510 -1.4317
0.0011 2.78 6100 0.0095 -7.1201 -29.6324 0.9888 22.5123 -119.1690 -90.2221 -1.3452 -1.4257
0.0022 2.83 6200 0.0086 -6.8942 -29.1673 0.9916 22.2731 -118.2387 -89.7703 -1.3531 -1.4335
0.0013 2.88 6300 0.0086 -6.8366 -29.0334 0.9916 22.1968 -117.9710 -89.6551 -1.3543 -1.4349
0.0033 2.92 6400 0.0096 -7.0073 -29.2913 0.9916 22.2840 -118.4869 -89.9966 -1.3494 -1.4303
0.0011 2.97 6500 0.0092 -6.9778 -29.3366 0.9916 22.3588 -118.5774 -89.9376 -1.3494 -1.4297

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1
Downloads last month
29
Safetensors
Model size
6.74B params
Tensor type
BF16
·