Edit model card

ultra-feedback-dutch-cleaned-hq-spin-geitje-7b-ultra-sft_iter2

This model is a fine-tuned version of davidberenstein1957/ultra-feedback-dutch-cleaned-hq-spin-geitje-7b-ultra-sft_iter1 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0162
  • Rewards/real: -8.1731
  • Rewards/generated: -31.3826
  • Rewards/accuracies: 0.9917
  • Rewards/margins: 23.2095
  • Logps/generated: -956.3063
  • Logps/real: -525.1735
  • Logits/generated: -1.5719
  • Logits/real: -1.7813

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/real Rewards/generated Rewards/accuracies Rewards/margins Logps/generated Logps/real Logits/generated Logits/real
0.6097 0.04 25 0.4147 -0.6192 -1.4312 0.9250 0.8120 -656.7919 -449.6341 -2.0004 -2.0773
0.2137 0.08 50 0.1745 -2.0300 -5.0060 0.9519 2.9761 -692.5404 -463.7422 -1.9306 -2.0237
0.1292 0.12 75 0.1012 -2.8227 -7.4967 0.9685 4.6740 -717.4471 -471.6697 -1.8843 -1.9887
0.0665 0.16 100 0.0676 -3.2936 -9.3177 0.9778 6.0240 -735.6567 -476.3786 -1.8508 -1.9628
0.0429 0.21 125 0.0477 -3.7328 -11.2722 0.9824 7.5395 -755.2025 -480.7701 -1.8123 -1.9332
0.0299 0.25 150 0.0369 -4.2161 -13.2599 0.9870 9.0437 -775.0787 -485.6039 -1.7938 -1.9226
0.0252 0.29 175 0.0320 -4.7201 -15.0489 0.9880 10.3288 -792.9691 -490.6432 -1.7758 -1.9116
0.0249 0.33 200 0.0301 -5.0757 -16.3570 0.9880 11.2813 -806.0497 -494.1995 -1.7515 -1.8923
0.0175 0.37 225 0.0273 -5.4299 -17.6751 0.9880 12.2451 -819.2310 -497.7419 -1.7362 -1.8821
0.0183 0.41 250 0.0254 -5.4183 -18.3899 0.9889 12.9715 -826.3791 -497.6259 -1.7300 -1.8793
0.0182 0.45 275 0.0245 -6.0900 -20.5760 0.9889 14.4860 -848.2401 -504.3426 -1.6961 -1.8564
0.0253 0.49 300 0.0224 -5.9239 -20.7184 0.9898 14.7944 -849.6640 -502.6819 -1.6938 -1.8573
0.0075 0.53 325 0.0234 -7.0436 -24.1126 0.9898 17.0691 -883.6064 -513.8781 -1.6522 -1.8252
0.0141 0.58 350 0.0212 -5.5696 -20.9714 0.9898 15.4017 -852.1937 -499.1387 -1.7082 -1.8693
0.0135 0.62 375 0.0182 -5.2646 -20.3901 0.9907 15.1254 -846.3809 -496.0890 -1.7285 -1.8897
0.014 0.66 400 0.0182 -5.5057 -21.1579 0.9907 15.6522 -854.0594 -498.4994 -1.7137 -1.8783
0.0122 0.7 425 0.0172 -5.3398 -20.7520 0.9907 15.4122 -849.9997 -496.8405 -1.7231 -1.8857
0.0144 0.74 450 0.0164 -4.6606 -19.3766 0.9917 14.7160 -836.2463 -490.0483 -1.7465 -1.9042
0.0103 0.78 475 0.0160 -4.8739 -20.1058 0.9907 15.2319 -843.5385 -492.1819 -1.7445 -1.9064
0.0147 0.82 500 0.0156 -5.1220 -20.9607 0.9917 15.8387 -852.0875 -494.6623 -1.7434 -1.9092
0.0154 0.86 525 0.0155 -5.1481 -21.3994 0.9917 16.2513 -856.4740 -494.9235 -1.7357 -1.9040
0.0158 0.91 550 0.0151 -5.6088 -22.9532 0.9917 17.3444 -872.0123 -499.5304 -1.7139 -1.8881
0.0053 0.95 575 0.0149 -5.7209 -23.5217 0.9917 17.8008 -877.6972 -500.6515 -1.7113 -1.8888
0.008 0.99 600 0.0147 -5.7523 -23.7474 0.9917 17.9952 -879.9544 -500.9651 -1.7086 -1.8878
0.0049 1.03 625 0.0154 -6.1839 -24.8883 0.9907 18.7044 -891.3632 -505.2818 -1.6731 -1.8585
0.0057 1.07 650 0.0155 -6.4947 -25.8924 0.9917 19.3977 -901.4037 -508.3892 -1.6592 -1.8484
0.0076 1.11 675 0.0158 -6.8543 -26.9217 0.9917 20.0674 -911.6970 -511.9859 -1.6407 -1.8339
0.004 1.15 700 0.0158 -7.1325 -27.7743 0.9917 20.6418 -920.2236 -514.7678 -1.6269 -1.8236
0.0168 1.19 725 0.0157 -6.9019 -26.2791 0.9917 19.3772 -905.2711 -512.4611 -1.6566 -1.8448
0.0022 1.23 750 0.0163 -6.9586 -26.5145 0.9917 19.5559 -907.6251 -513.0281 -1.6533 -1.8423
0.0039 1.28 775 0.0165 -7.5386 -28.2224 0.9917 20.6837 -924.7038 -518.8289 -1.6369 -1.8327
0.002 1.32 800 0.0165 -7.6568 -28.6441 0.9907 20.9872 -928.9208 -520.0109 -1.6365 -1.8344
0.002 1.36 825 0.0165 -7.7989 -29.2028 0.9917 21.4038 -934.5078 -521.4318 -1.6348 -1.8352
0.0019 1.4 850 0.0165 -7.8978 -29.5958 0.9917 21.6980 -938.4382 -522.4203 -1.6166 -1.8169
0.0041 1.44 875 0.0162 -7.9696 -29.7930 0.9917 21.8234 -940.4100 -523.1380 -1.6165 -1.8176
0.0023 1.48 900 0.0164 -8.2086 -30.6909 0.9917 22.4823 -949.3892 -525.5286 -1.6045 -1.8093
0.0038 1.52 925 0.0166 -8.1217 -30.6727 0.9917 22.5510 -949.2076 -524.6597 -1.5919 -1.7978
0.0096 1.56 950 0.0162 -7.8257 -30.1144 0.9917 22.2887 -943.6237 -521.6992 -1.5909 -1.7956
0.0057 1.6 975 0.0166 -8.0335 -30.6654 0.9917 22.6319 -949.1342 -523.7775 -1.5854 -1.7919
0.0046 1.65 1000 0.0165 -8.1757 -31.0139 0.9917 22.8382 -952.6191 -525.2000 -1.5768 -1.7852
0.0009 1.69 1025 0.0165 -8.0553 -30.7565 0.9917 22.7012 -950.0453 -523.9951 -1.5757 -1.7830
0.002 1.73 1050 0.0164 -8.1838 -31.3365 0.9917 23.1528 -955.8453 -525.2800 -1.5692 -1.7790
0.0069 1.77 1075 0.0163 -8.1908 -31.4118 0.9917 23.2210 -956.5981 -525.3508 -1.5749 -1.7850
0.0029 1.81 1100 0.0166 -8.4138 -32.0830 0.9917 23.6692 -963.3098 -527.5802 -1.5624 -1.7752
0.0047 1.85 1125 0.0166 -8.4223 -32.1526 0.9917 23.7304 -964.0065 -527.6652 -1.5631 -1.7759
0.0037 1.89 1150 0.0163 -8.1563 -31.3209 0.9917 23.1646 -955.6895 -525.0057 -1.5739 -1.7832
0.0026 1.93 1175 0.0163 -8.2107 -31.5009 0.9917 23.2901 -957.4888 -525.5498 -1.5708 -1.7807
0.0058 1.98 1200 0.0162 -8.1731 -31.3826 0.9917 23.2095 -956.3063 -525.1735 -1.5719 -1.7813

Framework versions

  • Transformers 4.37.0
  • Pytorch 2.1.2+cu121
  • Datasets 2.14.6
  • Tokenizers 0.15.2
Downloads last month
31
Safetensors
Model size
7.24B params
Tensor type
BF16
·

Finetuned from