CharlesLi's picture
Model save
0efffe3 verified
|
raw
history blame
7.25 kB
metadata
library_name: transformers
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: OpenELM-1_1B-DPO-full-max-6-reward
    results: []

OpenELM-1_1B-DPO-full-max-6-reward

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.6897
  • Rewards/chosen: -15.0625
  • Rewards/rejected: -16.625
  • Rewards/accuracies: 0.5918
  • Rewards/margins: 1.6172
  • Logps/rejected: -1952.0
  • Logps/chosen: -1824.0
  • Logits/rejected: 2.4844
  • Logits/chosen: 0.6445

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.5942 0.1047 100 0.6780 -1.3594 -1.5625 0.6055 0.2051 -446.0 -454.0 -11.125 -11.4375
0.561 0.2094 200 0.6944 -2.2188 -2.5156 0.6055 0.2949 -540.0 -540.0 -9.75 -10.25
0.5506 0.3141 300 0.7181 -4.4375 -4.8438 0.6074 0.4004 -772.0 -764.0 -7.5 -8.3125
0.5233 0.4188 400 0.8159 -4.0312 -4.5 0.5840 0.4668 -736.0 -720.0 -12.875 -13.625
0.5164 0.5236 500 0.7724 -3.1562 -3.8281 0.6270 0.6602 -672.0 -636.0 -6.6562 -8.3125
0.5142 0.6283 600 0.7633 -5.3438 -5.7188 0.5645 0.3770 -860.0 -852.0 -6.4375 -7.4062
0.4846 0.7330 700 0.8138 -5.6875 -6.3438 0.5996 0.6641 -924.0 -888.0 -7.25 -8.625
0.475 0.8377 800 0.7354 -5.5 -6.0938 0.6133 0.6016 -900.0 -868.0 -8.9375 -10.0625
0.4734 0.9424 900 0.8621 -7.5625 -8.3125 0.6074 0.7305 -1120.0 -1072.0 -5.75 -7.3438
0.146 1.0471 1000 1.1525 -10.6875 -11.5 0.5684 0.8477 -1440.0 -1384.0 -2.7031 -4.4062
0.1356 1.1518 1100 0.9972 -9.1875 -10.0625 0.6094 0.9062 -1296.0 -1232.0 -3.6875 -5.5312
0.1237 1.2565 1200 0.9993 -8.5 -9.25 0.6016 0.7617 -1216.0 -1168.0 -6.3438 -8.125
0.1387 1.3613 1300 1.1685 -10.875 -11.9375 0.5977 1.0625 -1488.0 -1408.0 -2.4219 -4.5625
0.1298 1.4660 1400 1.0590 -10.4375 -11.5 0.5996 1.0312 -1440.0 -1368.0 -2.9844 -4.9375
0.1284 1.5707 1500 1.1094 -11.125 -12.25 0.6191 1.1172 -1512.0 -1432.0 -1.9766 -3.9844
0.1084 1.6754 1600 1.1327 -10.8125 -11.9375 0.6230 1.1328 -1480.0 -1400.0 -2.2812 -4.1875
0.1028 1.7801 1700 1.1308 -11.0 -12.25 0.6230 1.2344 -1512.0 -1424.0 -1.7031 -3.625
0.1227 1.8848 1800 1.0562 -9.5 -10.5 0.6113 0.9961 -1336.0 -1264.0 -2.5156 -4.375
0.1119 1.9895 1900 1.1633 -11.3125 -12.5625 0.5996 1.2344 -1544.0 -1448.0 -1.2734 -3.1094
0.0166 2.0942 2000 1.5515 -14.25 -15.75 0.5840 1.5234 -1864.0 -1744.0 1.9375 0.0549
0.0122 2.1990 2100 1.6694 -14.8125 -16.5 0.6094 1.6484 -1936.0 -1800.0 1.5703 -0.4121
0.0158 2.3037 2200 1.7912 -15.8125 -17.5 0.6035 1.6875 -2040.0 -1904.0 2.5156 0.6445
0.0165 2.4084 2300 1.7817 -15.8125 -17.375 0.5938 1.5703 -2024.0 -1896.0 2.9219 1.1172
0.0169 2.5131 2400 1.7279 -15.25 -16.875 0.5977 1.5625 -1976.0 -1848.0 2.5156 0.6836
0.0189 2.6178 2500 1.7443 -15.375 -17.0 0.6016 1.6406 -1992.0 -1856.0 2.6719 0.8320
0.0134 2.7225 2600 1.6854 -15.0 -16.625 0.5898 1.6172 -1952.0 -1816.0 2.5469 0.7109
0.0161 2.8272 2700 1.6882 -15.0625 -16.625 0.5879 1.625 -1960.0 -1824.0 2.5312 0.6836
0.0165 2.9319 2800 1.6897 -15.0625 -16.625 0.5918 1.6172 -1952.0 -1824.0 2.4844 0.6445

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.3.0
  • Datasets 3.0.0
  • Tokenizers 0.19.1