Edit model card

PE-13b-full

This model is a fine-tuned version of stabilityai/StableBeluga-13B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0094
  • Rewards/chosen: -1.2833
  • Rewards/rejected: -29.7294
  • Rewards/accuracies: 0.9916
  • Rewards/margins: 28.4460
  • Logps/rejected: -121.9200
  • Logps/chosen: -84.7524
  • Logits/rejected: -2.1605
  • Logits/chosen: -2.4403

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-07
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5085 0.05 100 0.4978 0.1241 -0.3334 0.9525 0.4575 -63.1282 -81.9376 -2.0870 -2.3586
0.1966 0.09 200 0.2003 0.5022 -1.3704 0.9804 1.8726 -65.2020 -81.1812 -2.0918 -2.3650
0.0612 0.14 300 0.0656 0.8997 -3.3315 0.9888 4.2312 -69.1243 -80.3863 -2.0887 -2.3741
0.029 0.18 400 0.0356 0.9536 -5.0607 0.9944 6.0143 -72.5827 -80.2785 -2.0905 -2.3804
0.0187 0.23 500 0.0201 0.9079 -7.5059 0.9888 8.4139 -77.4731 -80.3699 -2.0974 -2.3915
0.0112 0.27 600 0.0130 0.7188 -10.4500 0.9916 11.1688 -83.3612 -80.7481 -2.0987 -2.3960
0.0066 0.32 700 0.0102 0.6639 -13.1345 0.9916 13.7984 -88.7303 -80.8579 -2.1111 -2.4104
0.0088 0.37 800 0.0098 0.9128 -13.1977 0.9888 14.1105 -88.8568 -80.3601 -2.1031 -2.4030
0.0054 0.41 900 0.0092 0.6109 -15.6398 0.9888 16.2507 -93.7409 -80.9640 -2.1158 -2.4144
0.0044 0.46 1000 0.0094 0.9982 -16.0071 0.9916 17.0053 -94.4755 -80.1893 -2.0988 -2.3946
0.0061 0.5 1100 0.0089 0.5504 -18.0125 0.9916 18.5630 -98.4864 -81.0849 -2.0991 -2.3955
0.024 0.55 1200 0.0088 0.4877 -16.6683 0.9916 17.1561 -95.7980 -81.2103 -2.0748 -2.3633
0.0039 0.59 1300 0.0087 0.3755 -18.5093 0.9916 18.8848 -99.4799 -81.4347 -2.0746 -2.3623
0.0051 0.64 1400 0.0086 0.1176 -20.5558 0.9916 20.6734 -103.5730 -81.9506 -2.0819 -2.3738
0.0023 0.68 1500 0.0089 0.1552 -20.0740 0.9888 20.2292 -102.6092 -81.8754 -2.0813 -2.3667
0.0027 0.73 1600 0.0089 -0.5025 -20.7978 0.9888 20.2953 -104.0569 -83.1908 -2.1179 -2.4078
0.0031 0.78 1700 0.0085 -0.6314 -21.0492 0.9916 20.4178 -104.5597 -83.4485 -2.0915 -2.3773
0.0049 0.82 1800 0.0085 -0.7786 -21.3333 0.9916 20.5547 -105.1278 -83.7429 -2.0670 -2.3504
0.0023 0.87 1900 0.0084 -0.7496 -22.3377 0.9944 21.5880 -107.1367 -83.6850 -2.0729 -2.3547
0.0067 0.91 2000 0.0086 -0.8126 -22.8024 0.9916 21.9899 -108.0662 -83.8109 -2.0651 -2.3472
0.0041 0.96 2100 0.0082 -0.7903 -21.8379 0.9944 21.0476 -106.1371 -83.7663 -2.0363 -2.3137
0.0025 1.0 2200 0.0079 -0.4489 -21.4451 0.9916 20.9963 -105.3516 -83.0835 -2.0303 -2.3074
0.0023 1.05 2300 0.0082 -1.1267 -22.7620 0.9944 21.6353 -107.9852 -84.4391 -2.0477 -2.3260
0.0055 1.1 2400 0.0085 -1.4969 -24.0568 0.9888 22.5599 -110.5749 -85.1796 -2.0616 -2.3384
0.0139 1.14 2500 0.0077 0.4564 -20.3860 0.9916 20.8424 -103.2333 -81.2730 -2.0453 -2.3206
0.0023 1.19 2600 0.0081 0.0858 -21.9640 0.9916 22.0498 -106.3893 -82.0141 -2.0528 -2.3273
0.0046 1.23 2700 0.0083 -0.2543 -23.4016 0.9916 23.1473 -109.2646 -82.6943 -2.0668 -2.3457
0.0033 1.28 2800 0.0083 -0.3317 -23.7872 0.9916 23.4555 -110.0356 -82.8491 -2.0884 -2.3650
0.0023 1.32 2900 0.0084 -0.2753 -24.3682 0.9916 24.0929 -111.1976 -82.7362 -2.1054 -2.3879
0.0034 1.37 3000 0.0081 0.4328 -23.3162 0.9916 23.7491 -109.0938 -81.3201 -2.0817 -2.3565
0.0033 1.42 3100 0.0082 -0.0254 -23.7390 0.9944 23.7136 -109.9394 -82.2366 -2.0706 -2.3447
0.0033 1.46 3200 0.0086 -0.7680 -24.0452 0.9916 23.2772 -110.5517 -83.7218 -2.0760 -2.3543
0.0032 1.51 3300 0.0086 -0.0016 -23.5161 0.9944 23.5145 -109.4934 -82.1889 -2.0881 -2.3655
0.0011 1.55 3400 0.0084 0.0195 -24.2635 0.9944 24.2831 -110.9884 -82.1467 -2.0878 -2.3667
0.0002 1.6 3500 0.0087 0.0421 -24.8306 0.9916 24.8728 -112.1225 -82.1015 -2.0890 -2.3698
0.0034 1.64 3600 0.0086 -0.2729 -25.8106 0.9916 25.5377 -114.0825 -82.7315 -2.1030 -2.3851
0.0027 1.69 3700 0.0086 0.0339 -25.0221 0.9916 25.0560 -112.5055 -82.1179 -2.1300 -2.4147
0.0056 1.73 3800 0.0082 0.1800 -23.6173 0.9916 23.7974 -109.6960 -81.8257 -2.1140 -2.3980
0.0026 1.78 3900 0.0083 -0.0334 -24.6060 0.9944 24.5725 -111.6733 -82.2526 -2.1140 -2.3965
0.0036 1.83 4000 0.0080 -0.2511 -23.0433 0.9916 22.7923 -108.5479 -82.6879 -2.1348 -2.4167
0.0044 1.87 4100 0.0084 -0.4259 -23.7811 0.9916 23.3551 -110.0234 -83.0376 -2.1314 -2.4160
0.0022 1.92 4200 0.0083 -0.5710 -23.2360 0.9944 22.6650 -108.9332 -83.3277 -2.1369 -2.4196
0.0044 1.96 4300 0.0085 -0.6363 -24.6474 0.9972 24.0111 -111.7560 -83.4583 -2.1307 -2.4109
0.0023 2.01 4400 0.0085 -0.6133 -24.9492 0.9916 24.3359 -112.3597 -83.4124 -2.1322 -2.4134
0.0033 2.05 4500 0.0085 -0.7101 -25.5054 0.9916 24.7953 -113.4721 -83.6059 -2.1326 -2.4142
0.0023 2.1 4600 0.0087 -0.7855 -26.0511 0.9916 25.2656 -114.5634 -83.7567 -2.1333 -2.4152
0.0011 2.15 4700 0.0088 -0.9006 -26.5845 0.9944 25.6839 -115.6303 -83.9870 -2.1369 -2.4198
0.0065 2.19 4800 0.0088 -0.7570 -26.8960 0.9916 26.1390 -116.2533 -83.6997 -2.1393 -2.4198
0.0022 2.24 4900 0.0091 -0.9581 -27.9431 0.9916 26.9850 -118.3475 -84.1019 -2.1428 -2.4245
0.0026 2.28 5000 0.0091 -1.2522 -28.8309 0.9944 27.5788 -120.1232 -84.6901 -2.1479 -2.4287
0.0033 2.33 5100 0.0089 -0.8602 -28.7323 0.9916 27.8721 -119.9259 -83.9062 -2.1522 -2.4328
0.0041 2.37 5200 0.0091 -1.0405 -29.2861 0.9916 28.2456 -121.0335 -84.2668 -2.1536 -2.4343
0.0023 2.42 5300 0.0093 -1.1323 -29.5240 0.9916 28.3917 -121.5093 -84.4504 -2.1529 -2.4336
0.0022 2.46 5400 0.0092 -1.2202 -29.2127 0.9916 27.9925 -120.8866 -84.6261 -2.1595 -2.4416
0.0 2.51 5500 0.0093 -1.4371 -29.7063 0.9916 28.2692 -121.8739 -85.0599 -2.1609 -2.4404
0.0022 2.56 5600 0.0095 -1.4397 -30.0202 0.9944 28.5804 -122.5016 -85.0652 -2.1584 -2.4383
0.0011 2.6 5700 0.0096 -1.6125 -30.0945 0.9916 28.4820 -122.6504 -85.4108 -2.1601 -2.4395
0.0053 2.65 5800 0.0095 -1.5638 -30.0025 0.9944 28.4387 -122.4663 -85.3133 -2.1615 -2.4398
0.003 2.69 5900 0.0095 -1.5904 -30.1980 0.9916 28.6076 -122.8572 -85.3666 -2.1606 -2.4406
0.0011 2.74 6000 0.0094 -1.5286 -30.0882 0.9944 28.5596 -122.6377 -85.2429 -2.1615 -2.4403
0.0008 2.78 6100 0.0095 -1.4405 -30.0174 0.9916 28.5769 -122.4961 -85.0667 -2.1615 -2.4400
0.0022 2.83 6200 0.0093 -1.3508 -29.9317 0.9916 28.5808 -122.3246 -84.8874 -2.1599 -2.4395
0.0019 2.88 6300 0.0093 -1.2416 -29.6525 0.9916 28.4109 -121.7663 -84.6690 -2.1620 -2.4415
0.0034 2.92 6400 0.0093 -1.2995 -29.7927 0.9916 28.4932 -122.0468 -84.7848 -2.1616 -2.4412
0.0014 2.97 6500 0.0092 -1.2574 -29.7200 0.9916 28.4626 -121.9014 -84.7006 -2.1595 -2.4408

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1
Downloads last month
1
Safetensors
Model size
13B params
Tensor type
BF16
·

Finetuned from