--- library_name: transformers tags: - trl - dpo - generated_from_trainer model-index: - name: OpenELM-1_1B-DPO-full-3-5 results: [] --- # OpenELM-1_1B-DPO-full-3-5 This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set: - Loss: 1.1637 - Rewards/chosen: -13.625 - Rewards/rejected: -17.0 - Rewards/accuracies: 0.7051 - Rewards/margins: 3.375 - Logps/rejected: -1984.0 - Logps/chosen: -1680.0 - Logits/rejected: 3.8594 - Logits/chosen: 1.9453 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 8 - eval_batch_size: 16 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 2 - total_train_batch_size: 64 - total_eval_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 5 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6232 | 0.1047 | 100 | 0.6285 | -0.6055 | -0.8242 | 0.6660 | 0.2178 | -368.0 | -374.0 | -8.25 | -8.5 | | 0.5729 | 0.2093 | 200 | 0.5957 | -1.6328 | -2.1094 | 0.6992 | 0.4766 | -498.0 | -478.0 | -7.9688 | -8.4375 | | 0.6122 | 0.3140 | 300 | 0.5751 | -1.6016 | -2.1094 | 0.7129 | 0.5 | -496.0 | -474.0 | -5.5938 | -6.1562 | | 0.5905 | 0.4186 | 400 | 0.5994 | -1.6328 | -2.1875 | 0.6680 | 0.5508 | -504.0 | -478.0 | -5.5625 | -6.3438 | | 0.5781 | 0.5233 | 500 | 0.5764 | -1.7188 | -2.2656 | 0.6816 | 0.5586 | -512.0 | -486.0 | -6.0625 | -6.8438 | | 0.5356 | 0.6279 | 600 | 0.5831 | -3.8906 | -4.5625 | 0.6699 | 0.6797 | -744.0 | -704.0 | -3.3281 | -4.25 | | 0.5756 | 0.7326 | 700 | 0.5859 | -3.4219 | -4.0312 | 0.7012 | 0.6133 | -692.0 | -656.0 | -8.8125 | -9.375 | | 0.5528 | 0.8373 | 800 | 0.5732 | -2.8906 | -3.5 | 0.6836 | 0.6016 | -636.0 | -604.0 | -7.4375 | -8.3125 | | 0.5753 | 0.9419 | 900 | 0.5693 | -3.0469 | -3.7344 | 0.7168 | 0.6797 | -660.0 | -620.0 | -7.0 | -7.9062 | | 0.2632 | 1.0466 | 1000 | 0.5881 | -4.1875 | -5.2188 | 0.7148 | 1.0312 | -808.0 | -732.0 | -2.875 | -4.25 | | 0.2283 | 1.1512 | 1100 | 0.6142 | -4.5312 | -5.5312 | 0.7129 | 0.9961 | -840.0 | -768.0 | -5.375 | -7.0625 | | 0.2202 | 1.2559 | 1200 | 0.5943 | -4.0938 | -5.1875 | 0.7090 | 1.0781 | -804.0 | -724.0 | -1.875 | -3.375 | | 0.2472 | 1.3605 | 1300 | 0.5995 | -4.4375 | -5.4062 | 0.7168 | 0.9844 | -828.0 | -760.0 | -2.2188 | -3.6875 | | 0.2406 | 1.4652 | 1400 | 0.5971 | -5.2188 | -6.2188 | 0.7188 | 1.0156 | -908.0 | -836.0 | -3.875 | -5.2812 | | 0.2059 | 1.5699 | 1500 | 0.6052 | -5.3438 | -6.5312 | 0.7148 | 1.1953 | -940.0 | -848.0 | -4.2188 | -5.7812 | | 0.2305 | 1.6745 | 1600 | 0.6068 | -4.875 | -5.9062 | 0.7188 | 1.0391 | -876.0 | -800.0 | -5.1562 | -6.6875 | | 0.2327 | 1.7792 | 1700 | 0.6141 | -5.9375 | -7.1562 | 0.7168 | 1.2188 | -1000.0 | -908.0 | -4.5 | -6.0625 | | 0.2221 | 1.8838 | 1800 | 0.6072 | -6.4688 | -7.6562 | 0.7266 | 1.1875 | -1048.0 | -960.0 | -1.9844 | -3.625 | | 0.2153 | 1.9885 | 1900 | 0.5949 | -6.5 | -7.6875 | 0.7266 | 1.1953 | -1056.0 | -964.0 | -3.3125 | -4.875 | | 0.0215 | 2.0931 | 2000 | 0.7470 | -8.6875 | -10.5 | 0.7246 | 1.8125 | -1336.0 | -1184.0 | -0.1074 | -1.9609 | | 0.0303 | 2.1978 | 2100 | 0.7469 | -8.3125 | -10.25 | 0.7031 | 1.9453 | -1312.0 | -1144.0 | -0.1299 | -2.0781 | | 0.0322 | 2.3025 | 2200 | 0.7584 | -8.5625 | -10.4375 | 0.7109 | 1.8828 | -1328.0 | -1168.0 | -0.5156 | -2.6094 | | 0.0253 | 2.4071 | 2300 | 0.8087 | -9.8125 | -11.9375 | 0.7129 | 2.125 | -1480.0 | -1296.0 | 1.2656 | -0.7539 | | 0.0302 | 2.5118 | 2400 | 0.8033 | -9.0 | -11.0625 | 0.7246 | 2.0312 | -1392.0 | -1216.0 | 2.2812 | 0.4395 | | 0.0218 | 2.6164 | 2500 | 0.8603 | -11.0 | -13.3125 | 0.7188 | 2.3125 | -1616.0 | -1408.0 | 2.2969 | 0.5195 | | 0.027 | 2.7211 | 2600 | 0.8162 | -9.75 | -12.0 | 0.7402 | 2.2188 | -1488.0 | -1288.0 | 1.0703 | -0.9609 | | 0.0274 | 2.8257 | 2700 | 0.8296 | -9.75 | -12.0 | 0.7188 | 2.2188 | -1480.0 | -1288.0 | 1.125 | -0.9102 | | 0.0369 | 2.9304 | 2800 | 0.8085 | -9.5625 | -11.875 | 0.7227 | 2.3125 | -1472.0 | -1272.0 | 0.6289 | -1.4531 | | 0.0154 | 3.0351 | 2900 | 0.8779 | -9.875 | -12.375 | 0.7266 | 2.5 | -1520.0 | -1296.0 | 0.9609 | -1.3125 | | 0.007 | 3.1397 | 3000 | 0.9780 | -11.5 | -14.375 | 0.7207 | 2.875 | -1728.0 | -1464.0 | 2.7969 | 0.6836 | | 0.0059 | 3.2444 | 3100 | 0.9793 | -11.125 | -14.0 | 0.7090 | 2.875 | -1688.0 | -1424.0 | 2.2188 | 0.0258 | | 0.0102 | 3.3490 | 3200 | 0.9823 | -11.0625 | -13.875 | 0.7148 | 2.8281 | -1672.0 | -1424.0 | 2.7656 | 0.7539 | | 0.0082 | 3.4537 | 3300 | 1.0423 | -12.1875 | -15.1875 | 0.7051 | 3.0 | -1800.0 | -1528.0 | 3.3281 | 1.4453 | | 0.0109 | 3.5583 | 3400 | 1.0225 | -11.375 | -14.375 | 0.7168 | 2.9688 | -1720.0 | -1456.0 | 2.875 | 0.8672 | | 0.0098 | 3.6630 | 3500 | 1.0070 | -11.4375 | -14.25 | 0.7109 | 2.8438 | -1712.0 | -1456.0 | 3.1875 | 1.1719 | | 0.007 | 3.7677 | 3600 | 1.0390 | -11.9375 | -14.9375 | 0.7148 | 3.0 | -1776.0 | -1512.0 | 2.8594 | 0.8086 | | 0.0057 | 3.8723 | 3700 | 1.0702 | -12.75 | -15.8125 | 0.7031 | 3.0625 | -1864.0 | -1584.0 | 3.4531 | 1.5 | | 0.0054 | 3.9770 | 3800 | 1.0485 | -12.4375 | -15.4375 | 0.7031 | 3.0 | -1832.0 | -1560.0 | 3.4062 | 1.4688 | | 0.0037 | 4.0816 | 3900 | 1.0905 | -12.8125 | -15.9375 | 0.7031 | 3.1406 | -1880.0 | -1600.0 | 3.5469 | 1.6172 | | 0.0031 | 4.1863 | 4000 | 1.1163 | -13.0625 | -16.25 | 0.7012 | 3.2188 | -1912.0 | -1616.0 | 3.6094 | 1.6562 | | 0.0037 | 4.2909 | 4100 | 1.1256 | -13.125 | -16.375 | 0.7090 | 3.2656 | -1920.0 | -1624.0 | 3.6094 | 1.6562 | | 0.0089 | 4.3956 | 4200 | 1.1395 | -13.3125 | -16.625 | 0.7070 | 3.3125 | -1952.0 | -1648.0 | 3.75 | 1.8125 | | 0.0042 | 4.5003 | 4300 | 1.1512 | -13.4375 | -16.75 | 0.7051 | 3.3438 | -1968.0 | -1664.0 | 3.7969 | 1.8672 | | 0.0094 | 4.6049 | 4400 | 1.1580 | -13.5 | -16.875 | 0.7070 | 3.3594 | -1976.0 | -1664.0 | 3.8125 | 1.8828 | | 0.006 | 4.7096 | 4500 | 1.1593 | -13.5625 | -17.0 | 0.7051 | 3.375 | -1984.0 | -1672.0 | 3.8438 | 1.9219 | | 0.0029 | 4.8142 | 4600 | 1.1617 | -13.625 | -17.0 | 0.7051 | 3.375 | -1984.0 | -1680.0 | 3.8594 | 1.9375 | | 0.0059 | 4.9189 | 4700 | 1.1637 | -13.625 | -17.0 | 0.7051 | 3.375 | -1984.0 | -1680.0 | 3.8594 | 1.9453 | ### Framework versions - Transformers 4.44.2 - Pytorch 2.1.2 - Datasets 2.18.0 - Tokenizers 0.19.1