Edit model card

PE-7b-full

This model is a fine-tuned version of stabilityai/StableBeluga-7B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0066
  • Rewards/chosen: -0.4634
  • Rewards/rejected: -29.4677
  • Rewards/accuracies: 0.9888
  • Rewards/margins: 29.0043
  • Logps/rejected: -123.7968
  • Logps/chosen: -86.4647
  • Logits/rejected: -0.9397
  • Logits/chosen: -1.0258

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-07
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5515 0.05 100 0.5420 0.0859 -0.2570 0.9106 0.3429 -65.3753 -85.3660 -0.4062 -0.3984
0.2448 0.09 200 0.2390 0.3339 -1.2158 0.9888 1.5497 -67.2930 -84.8700 -0.3981 -0.3938
0.0948 0.14 300 0.0937 0.5868 -2.7801 0.9888 3.3669 -70.4216 -84.3642 -0.4035 -0.4048
0.0466 0.18 400 0.0534 0.8294 -3.8553 0.9888 4.6847 -72.5720 -83.8791 -0.4093 -0.4163
0.0261 0.23 500 0.0265 0.9450 -6.0844 0.9888 7.0294 -77.0302 -83.6478 -0.4364 -0.4554
0.0129 0.27 600 0.0155 0.8955 -9.0192 0.9888 9.9148 -82.8998 -83.7468 -0.4445 -0.4719
0.0069 0.32 700 0.0112 0.7377 -12.1121 0.9888 12.8498 -89.0856 -84.0625 -0.4617 -0.4981
0.0044 0.37 800 0.0095 0.5497 -14.1581 0.9888 14.7077 -93.1776 -84.4385 -0.4990 -0.5433
0.004 0.41 900 0.0095 0.5781 -16.2335 0.9888 16.8116 -97.3284 -84.3816 -0.5203 -0.5699
0.0046 0.46 1000 0.0089 0.7788 -16.4021 0.9944 17.1809 -97.6655 -83.9802 -0.5542 -0.6027
0.0127 0.5 1100 0.0080 0.7755 -16.5854 0.9972 17.3608 -98.0321 -83.9869 -0.5407 -0.5882
0.0192 0.55 1200 0.0082 0.8670 -16.5839 0.9916 17.4509 -98.0292 -83.8038 -0.5752 -0.6293
0.0043 0.59 1300 0.0083 0.7239 -18.6420 0.9888 19.3659 -102.1455 -84.0900 -0.5830 -0.6429
0.0063 0.64 1400 0.0078 0.7659 -18.6685 0.9944 19.4343 -102.1983 -84.0061 -0.6007 -0.6587
0.0022 0.68 1500 0.0079 0.5026 -19.6619 0.9888 20.1645 -104.1852 -84.5328 -0.5968 -0.6541
0.005 0.73 1600 0.0069 0.7199 -19.4554 0.9888 20.1754 -103.7723 -84.0980 -0.6482 -0.7077
0.0038 0.78 1700 0.0068 0.8857 -19.6624 0.9888 20.5482 -104.1863 -83.7664 -0.6247 -0.6826
0.0073 0.82 1800 0.0069 1.0901 -19.2226 0.9888 20.3128 -103.3067 -83.3576 -0.5992 -0.6539
0.003 0.87 1900 0.0070 0.8410 -20.8141 0.9916 21.6551 -106.4896 -83.8559 -0.6232 -0.6807
0.01 0.91 2000 0.0059 1.1930 -19.2176 0.9916 20.4106 -103.2967 -83.1518 -0.6641 -0.7237
0.0061 0.96 2100 0.0059 1.6526 -18.5437 0.9944 20.1963 -101.9488 -82.2327 -0.6358 -0.6923
0.0024 1.0 2200 0.0058 1.1178 -18.9331 0.9916 20.0508 -102.7275 -83.3023 -0.6583 -0.7134
0.0027 1.05 2300 0.0058 1.1025 -19.5454 0.9916 20.6479 -103.9522 -83.3328 -0.6872 -0.7461
0.0057 1.1 2400 0.0061 0.8696 -21.2149 0.9972 22.0845 -107.2913 -83.7987 -0.7055 -0.7693
0.0076 1.14 2500 0.0055 1.0200 -20.0814 0.9916 21.1015 -105.0243 -83.4978 -0.7059 -0.7654
0.0025 1.19 2600 0.0058 1.1592 -20.6357 0.9916 21.7949 -106.1328 -83.2194 -0.6809 -0.7385
0.0046 1.23 2700 0.0057 0.8385 -20.2499 0.9916 21.0883 -105.3611 -83.8609 -0.6824 -0.7374
0.0039 1.28 2800 0.0058 0.6816 -20.7605 0.9944 21.4421 -106.3823 -84.1746 -0.7219 -0.7845
0.0024 1.32 2900 0.0059 0.7668 -22.0609 0.9916 22.8277 -108.9833 -84.0043 -0.7625 -0.8322
0.0033 1.37 3000 0.0055 1.3010 -20.9961 0.9916 22.2971 -106.8535 -82.9359 -0.7534 -0.8209
0.0035 1.42 3100 0.0054 1.3118 -20.5134 0.9916 21.8252 -105.8883 -82.9143 -0.7521 -0.8162
0.0033 1.46 3200 0.0055 1.3335 -20.8912 0.9944 22.2247 -106.6437 -82.8708 -0.7612 -0.8293
0.0021 1.51 3300 0.0058 1.1173 -22.2893 0.9916 23.4066 -109.4400 -83.3032 -0.7843 -0.8567
0.0012 1.55 3400 0.0064 0.9283 -23.7370 0.9916 24.6653 -112.3355 -83.6813 -0.7884 -0.8611
0.0004 1.6 3500 0.0068 0.9145 -24.8915 0.9916 25.8060 -114.6444 -83.7089 -0.7965 -0.8752
0.0035 1.64 3600 0.0063 0.9910 -24.2224 0.9916 25.2134 -113.3062 -83.5558 -0.8029 -0.8784
0.0031 1.69 3700 0.0069 0.6681 -25.5613 0.9888 26.2294 -115.9840 -84.2016 -0.8334 -0.9142
0.0063 1.73 3800 0.0064 0.9458 -24.5955 0.9916 25.5413 -114.0523 -83.6462 -0.8256 -0.9017
0.0041 1.78 3900 0.0067 1.2097 -24.8538 0.9916 26.0635 -114.5689 -83.1184 -0.8070 -0.8836
0.0034 1.83 4000 0.0062 1.4285 -24.3976 0.9916 25.8261 -113.6566 -82.6809 -0.8036 -0.8783
0.0043 1.87 4100 0.0064 1.2740 -24.3713 0.9916 25.6453 -113.6040 -82.9898 -0.7990 -0.8751
0.0008 1.92 4200 0.0062 0.8474 -23.9420 0.9888 24.7894 -112.7454 -83.8431 -0.7886 -0.8626
0.0045 1.96 4300 0.0061 0.7728 -24.1102 0.9888 24.8830 -113.0818 -83.9923 -0.8065 -0.8774
0.0025 2.01 4400 0.0056 0.6654 -23.7611 0.9916 24.4265 -112.3836 -84.2071 -0.8145 -0.8857
0.0032 2.05 4500 0.0057 0.3483 -24.4947 0.9944 24.8430 -113.8508 -84.8413 -0.8185 -0.8907
0.0022 2.1 4600 0.0058 0.2667 -24.9497 0.9916 25.2165 -114.7609 -85.0044 -0.8229 -0.8971
0.0011 2.15 4700 0.0060 -0.0008 -25.6118 0.9888 25.6110 -116.0850 -85.5395 -0.8374 -0.9151
0.0066 2.19 4800 0.0061 -0.0374 -26.0644 0.9916 26.0271 -116.9903 -85.6126 -0.8540 -0.9306
0.0022 2.24 4900 0.0063 0.0818 -26.4597 0.9888 26.5415 -117.7807 -85.3742 -0.8703 -0.9493
0.0028 2.28 5000 0.0077 -0.7880 -28.0697 0.9888 27.2817 -121.0009 -87.1139 -0.8802 -0.9634
0.0038 2.33 5100 0.0068 -0.2700 -27.7191 0.9888 27.4491 -120.2996 -86.0779 -0.8897 -0.9719
0.0044 2.37 5200 0.0071 -0.4228 -28.3790 0.9916 27.9562 -121.6195 -86.3835 -0.8925 -0.9753
0.0022 2.42 5300 0.0072 -0.5950 -29.1837 0.9888 28.5887 -123.2288 -86.7278 -0.9123 -0.9960
0.0022 2.46 5400 0.0073 -0.7860 -29.4389 0.9888 28.6529 -123.7393 -87.1098 -0.9223 -1.0087
0.0001 2.51 5500 0.0073 -0.9058 -29.7384 0.9888 28.8326 -124.3382 -87.3495 -0.9299 -1.0164
0.0022 2.56 5600 0.0070 -0.8180 -29.5150 0.9916 28.6970 -123.8913 -87.1738 -0.9322 -1.0181
0.0011 2.6 5700 0.0071 -0.6171 -29.4081 0.9888 28.7910 -123.6776 -86.7721 -0.9332 -1.0216
0.0054 2.65 5800 0.0071 -0.6641 -29.7689 0.9888 29.1048 -124.3992 -86.8661 -0.9405 -1.0273
0.0033 2.69 5900 0.0067 -0.8082 -29.8306 0.9916 29.0224 -124.5226 -87.1543 -0.9436 -1.0315
0.0011 2.74 6000 0.0068 -0.7323 -29.7235 0.9888 28.9912 -124.3083 -87.0024 -0.9401 -1.0278
0.0011 2.78 6100 0.0065 -0.5717 -29.4620 0.9916 28.8903 -123.7853 -86.6812 -0.9389 -1.0250
0.0023 2.83 6200 0.0066 -0.5549 -29.5194 0.9916 28.9645 -123.9002 -86.6477 -0.9369 -1.0244
0.0022 2.88 6300 0.0065 -0.4476 -29.3706 0.9916 28.9230 -123.6025 -86.4330 -0.9362 -1.0220
0.0035 2.92 6400 0.0066 -0.4967 -29.5205 0.9916 29.0238 -123.9024 -86.5313 -0.9395 -1.0256
0.0012 2.97 6500 0.0065 -0.4666 -29.4634 0.9916 28.9968 -123.7882 -86.4710 -0.9381 -1.0240

Framework versions

  • Transformers 4.35.0
  • Pytorch 2.1.1+cu121
  • Datasets 2.14.6
  • Tokenizers 0.14.1
Downloads last month
0
Safetensors
Model size
6.74B params
Tensor type
BF16
·

Finetuned from