nnheui's picture
Model save
91aef91 verified
|
raw
history blame
No virus
5.6 kB
metadata
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: pythia-1.4b-dpo-full
    results: []

pythia-1.4b-dpo-full

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6403
  • Rewards/chosen: 0.6094
  • Rewards/rejected: 0.4102
  • Rewards/accuracies: 0.5893
  • Rewards/margins: 0.2002
  • Logps/rejected: -2024.0
  • Logps/chosen: -2320.0
  • Logits/rejected: -0.6719
  • Logits/chosen: -0.6172

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 5
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 6
  • total_train_batch_size: 30
  • total_eval_batch_size: 48
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.684 0.05 100 0.6768 0.2314 0.1904 0.4494 0.0405 -2048.0 -2352.0 -0.7227 -0.6641
0.663 0.1 200 0.6566 0.5977 0.4883 0.4940 0.1108 -2016.0 -2320.0 -0.7266 -0.6680
0.6529 0.15 300 0.6513 0.625 0.4941 0.5149 0.1279 -2016.0 -2320.0 -0.7188 -0.6562
0.6371 0.2 400 0.6491 0.6562 0.5 0.5595 0.1523 -2016.0 -2304.0 -0.7266 -0.6680
0.6206 0.25 500 0.6466 0.5391 0.3945 0.5952 0.1445 -2024.0 -2320.0 -0.7148 -0.6562
0.686 0.29 600 0.6446 0.5781 0.4180 0.5714 0.1592 -2024.0 -2320.0 -0.7188 -0.6602
0.6459 0.34 700 0.6449 0.5508 0.3633 0.6012 0.1885 -2032.0 -2320.0 -0.6875 -0.6289
0.6458 0.39 800 0.6421 0.5586 0.3867 0.5774 0.1709 -2024.0 -2320.0 -0.6953 -0.6406
0.6451 0.44 900 0.6398 0.7109 0.5039 0.5685 0.2070 -2016.0 -2304.0 -0.6719 -0.6133
0.6213 0.49 1000 0.6407 0.7734 0.5742 0.5714 0.2012 -2008.0 -2304.0 -0.6602 -0.6016
0.6313 0.54 1100 0.6387 0.5391 0.3555 0.5893 0.1807 -2032.0 -2320.0 -0.6680 -0.6094
0.6298 0.59 1200 0.6380 0.6953 0.4922 0.6042 0.2031 -2016.0 -2304.0 -0.6523 -0.5977
0.6461 0.64 1300 0.6396 0.5586 0.3613 0.5863 0.1963 -2032.0 -2320.0 -0.6914 -0.6367
0.6258 0.69 1400 0.6360 0.6914 0.4727 0.5923 0.2207 -2016.0 -2304.0 -0.6758 -0.6172
0.6347 0.74 1500 0.6375 0.625 0.4141 0.5893 0.2100 -2024.0 -2320.0 -0.6641 -0.6094
0.6185 0.79 1600 0.6382 0.5977 0.3926 0.6042 0.2051 -2032.0 -2320.0 -0.6797 -0.625
0.6408 0.83 1700 0.6374 0.5977 0.3926 0.5952 0.2041 -2024.0 -2320.0 -0.6719 -0.6172
0.662 0.88 1800 0.6355 0.6094 0.3984 0.6012 0.2119 -2024.0 -2320.0 -0.6836 -0.6289
0.6385 0.93 1900 0.6379 0.6055 0.3926 0.625 0.2129 -2024.0 -2320.0 -0.6758 -0.6211
0.6154 0.98 2000 0.6381 0.6094 0.4043 0.6012 0.2041 -2024.0 -2320.0 -0.6758 -0.6211

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.2.1
  • Datasets 2.14.6
  • Tokenizers 0.15.2