OpenELM-1_1B-DPO-full-max-reward-most-similar
This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.6639
- Rewards/chosen: -17.25
- Rewards/rejected: -19.375
- Rewards/accuracies: 0.6035
- Rewards/margins: 2.125
- Logps/rejected: -2224.0
- Logps/chosen: -2048.0
- Logits/rejected: 2.9531
- Logits/chosen: 1.3516
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 16
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 2
- total_train_batch_size: 64
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.5851 | 0.1047 | 100 | 0.6821 | -1.375 | -1.6094 | 0.6074 | 0.2354 | -450.0 | -456.0 | -11.75 | -12.0625 |
0.5386 | 0.2094 | 200 | 0.6998 | -3.1875 | -3.6094 | 0.5781 | 0.4160 | -648.0 | -636.0 | -5.6562 | -6.5312 |
0.5183 | 0.3141 | 300 | 0.7188 | -4.75 | -5.3125 | 0.6055 | 0.5820 | -820.0 | -792.0 | -7.0625 | -8.0625 |
0.4924 | 0.4188 | 400 | 0.8317 | -6.3438 | -7.0625 | 0.5918 | 0.7227 | -996.0 | -952.0 | -7.5938 | -9.5 |
0.5057 | 0.5236 | 500 | 0.7777 | -5.125 | -5.8125 | 0.5918 | 0.7070 | -872.0 | -828.0 | -9.75 | -11.0 |
0.5085 | 0.6283 | 600 | 0.7983 | -5.2812 | -6.0938 | 0.5918 | 0.7891 | -896.0 | -848.0 | -8.625 | -10.25 |
0.4655 | 0.7330 | 700 | 0.8072 | -3.9375 | -4.7812 | 0.625 | 0.8516 | -768.0 | -712.0 | -8.75 | -10.375 |
0.4638 | 0.8377 | 800 | 0.8442 | -7.3438 | -7.9688 | 0.5781 | 0.625 | -1088.0 | -1056.0 | -2.5469 | -3.9688 |
0.4265 | 0.9424 | 900 | 0.9620 | -8.0 | -8.9375 | 0.5918 | 0.9023 | -1184.0 | -1120.0 | -4.8125 | -6.4375 |
0.1656 | 1.0471 | 1000 | 0.9980 | -8.4375 | -9.625 | 0.6055 | 1.1953 | -1248.0 | -1160.0 | -1.5234 | -3.3438 |
0.1481 | 1.1518 | 1100 | 1.0423 | -9.625 | -10.8125 | 0.5918 | 1.1641 | -1368.0 | -1280.0 | -4.2812 | -6.0938 |
0.1547 | 1.2565 | 1200 | 1.0939 | -11.625 | -12.6875 | 0.5957 | 1.0859 | -1560.0 | -1480.0 | -3.1719 | -4.625 |
0.1577 | 1.3613 | 1300 | 1.0585 | -10.8125 | -12.0 | 0.5996 | 1.2266 | -1488.0 | -1400.0 | -0.75 | -2.3281 |
0.1773 | 1.4660 | 1400 | 1.0706 | -11.125 | -12.25 | 0.5938 | 1.1406 | -1512.0 | -1432.0 | -1.1328 | -2.7344 |
0.1675 | 1.5707 | 1500 | 1.0756 | -11.4375 | -12.75 | 0.6133 | 1.3125 | -1560.0 | -1464.0 | -0.7383 | -2.375 |
0.1329 | 1.6754 | 1600 | 1.0396 | -9.875 | -11.3125 | 0.6367 | 1.4531 | -1424.0 | -1304.0 | -1.7969 | -3.7969 |
0.1055 | 1.7801 | 1700 | 1.1083 | -11.5 | -12.9375 | 0.6113 | 1.4375 | -1584.0 | -1472.0 | -0.5742 | -2.2656 |
0.1226 | 1.8848 | 1800 | 1.0953 | -10.9375 | -12.3125 | 0.6094 | 1.3672 | -1520.0 | -1408.0 | 0.0625 | -1.5156 |
0.1211 | 1.9895 | 1900 | 1.0709 | -11.375 | -12.75 | 0.6133 | 1.4219 | -1568.0 | -1456.0 | 0.6758 | -0.9648 |
0.0277 | 2.0942 | 2000 | 1.4782 | -15.9375 | -17.75 | 0.6016 | 1.7891 | -2064.0 | -1912.0 | 2.0938 | 0.4316 |
0.0199 | 2.1990 | 2100 | 1.7630 | -18.625 | -20.75 | 0.5977 | 2.1094 | -2368.0 | -2192.0 | 3.0312 | 1.4688 |
0.0298 | 2.3037 | 2200 | 1.5056 | -16.0 | -17.875 | 0.6055 | 1.8203 | -2080.0 | -1920.0 | 2.6406 | 1.0312 |
0.0278 | 2.4084 | 2300 | 1.6823 | -17.625 | -19.625 | 0.5996 | 1.9453 | -2256.0 | -2080.0 | 3.375 | 1.8125 |
0.0401 | 2.5131 | 2400 | 1.6474 | -17.375 | -19.375 | 0.6055 | 2.0781 | -2224.0 | -2048.0 | 3.125 | 1.5469 |
0.025 | 2.6178 | 2500 | 1.6601 | -17.25 | -19.5 | 0.6055 | 2.1719 | -2240.0 | -2048.0 | 2.9219 | 1.3125 |
0.0251 | 2.7225 | 2600 | 1.6498 | -17.125 | -19.25 | 0.6035 | 2.125 | -2224.0 | -2032.0 | 2.9219 | 1.3203 |
0.0249 | 2.8272 | 2700 | 1.6541 | -17.25 | -19.25 | 0.6055 | 2.0781 | -2224.0 | -2040.0 | 2.9531 | 1.3516 |
0.0222 | 2.9319 | 2800 | 1.6639 | -17.25 | -19.375 | 0.6035 | 2.125 | -2224.0 | -2048.0 | 2.9531 | 1.3516 |
Framework versions
- Transformers 4.44.2
- Pytorch 2.3.0
- Datasets 3.0.0
- Tokenizers 0.19.1
- Downloads last month
- 9
Inference API (serverless) does not yet support model repos that contain custom code.