martimfasantos's picture
End of training
3752aff verified
metadata
license: apache-2.0
base_model: martimfasantos/tinyllama-1.1b-chat-sft-full
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
model-index:
  - name: tinyllama-1.1b-chat-dpo-full
    results: []

tinyllama-1.1b-chat-dpo-full

This model is a fine-tuned version of martimfasantos/tinyllama-1.1b-chat-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5860
  • Rewards/chosen: -1.1602
  • Rewards/rejected: -1.6135
  • Rewards/accuracies: 0.6890
  • Rewards/margins: 0.4533
  • Logps/rejected: -458.4552
  • Logps/chosen: -452.2377
  • Logits/rejected: -2.3877
  • Logits/chosen: -2.4300

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.693 0.0262 100 0.6929 -0.0014 -0.0019 0.5320 0.0006 -297.2994 -336.3557 -3.1228 -3.1361
0.6887 0.0523 200 0.6892 -0.0302 -0.0383 0.6160 0.0081 -300.9348 -339.2341 -3.1215 -3.1346
0.6789 0.0785 300 0.6794 -0.0789 -0.1087 0.6360 0.0299 -307.9798 -344.1051 -3.1094 -3.1216
0.6624 0.1047 400 0.6635 -0.1807 -0.2518 0.6390 0.0711 -322.2854 -354.2890 -3.0664 -3.0771
0.6373 0.1309 500 0.6503 -0.2988 -0.4120 0.6425 0.1133 -338.3080 -366.0959 -2.9693 -2.9839
0.6423 0.1570 600 0.6457 -0.3891 -0.5345 0.6375 0.1454 -350.5518 -375.1291 -2.9372 -2.9538
0.6266 0.1832 700 0.6420 -0.7030 -0.9081 0.6365 0.2051 -387.9123 -406.5211 -2.9095 -2.9229
0.5942 0.2094 800 0.6367 -0.4969 -0.6764 0.6475 0.1795 -364.7484 -385.9118 -2.9255 -2.9397
0.6171 0.2355 900 0.6330 -0.5389 -0.7443 0.6545 0.2054 -371.5351 -390.1065 -2.8815 -2.8992
0.6156 0.2617 1000 0.6271 -0.9278 -1.1788 0.6460 0.2510 -414.9855 -428.9975 -2.8469 -2.8665
0.6636 0.2879 1100 0.6234 -0.7984 -1.0304 0.6515 0.2320 -400.1489 -416.0618 -2.8144 -2.8347
0.6832 0.3141 1200 0.6152 -1.0303 -1.3170 0.6570 0.2866 -428.8004 -439.2536 -2.7994 -2.8212
0.5967 0.3402 1300 0.6131 -1.2342 -1.5321 0.6655 0.2979 -450.3198 -459.6400 -2.7494 -2.7756
0.596 0.3664 1400 0.6064 -0.8587 -1.1697 0.6820 0.3110 -414.0766 -422.0903 -2.8084 -2.8289
0.592 0.3926 1500 0.6027 -0.9689 -1.3189 0.6715 0.3499 -428.9929 -433.1132 -2.7455 -2.7703
0.6353 0.4187 1600 0.6051 -0.9640 -1.3223 0.6745 0.3582 -429.3314 -432.6226 -2.6972 -2.7245
0.6603 0.4449 1700 0.6016 -0.9893 -1.3221 0.6765 0.3328 -429.3145 -435.1521 -2.7021 -2.7305
0.5551 0.4711 1800 0.6023 -1.0035 -1.3765 0.6790 0.3731 -434.7590 -436.5641 -2.6159 -2.6492
0.5877 0.4973 1900 0.5975 -0.8137 -1.1853 0.6835 0.3716 -415.6308 -417.5872 -2.6621 -2.6941
0.5827 0.5234 2000 0.5935 -0.8724 -1.2562 0.6810 0.3838 -422.7221 -423.4575 -2.6043 -2.6396
0.6017 0.5496 2100 0.5911 -1.0065 -1.3971 0.6905 0.3907 -436.8172 -436.8658 -2.6105 -2.6436
0.5539 0.5758 2200 0.5920 -0.9060 -1.2945 0.6885 0.3884 -426.5499 -426.8195 -2.5724 -2.6076
0.5795 0.6019 2300 0.5914 -1.1164 -1.5398 0.6865 0.4234 -451.0841 -447.8605 -2.5399 -2.5757
0.5657 0.6281 2400 0.5904 -1.0347 -1.4494 0.6860 0.4147 -442.0414 -439.6861 -2.5121 -2.5487
0.5306 0.6543 2500 0.5918 -1.0464 -1.4840 0.6825 0.4376 -445.5005 -440.8591 -2.4692 -2.5102
0.5762 0.6805 2600 0.5927 -1.0687 -1.5141 0.6780 0.4455 -448.5193 -443.0862 -2.4291 -2.4735
0.6016 0.7066 2700 0.5936 -1.0767 -1.5080 0.6800 0.4313 -447.9063 -443.8889 -2.4329 -2.4747
0.6068 0.7328 2800 0.5897 -1.1905 -1.6433 0.6820 0.4527 -461.4312 -455.2722 -2.4294 -2.4708
0.5821 0.7590 2900 0.5870 -1.1245 -1.5598 0.6845 0.4353 -453.0833 -448.6697 -2.4470 -2.4862
0.5393 0.7851 3000 0.5873 -1.2223 -1.6710 0.6870 0.4486 -464.2020 -458.4521 -2.4161 -2.4565
0.577 0.8113 3100 0.5886 -1.1359 -1.5757 0.6845 0.4399 -454.6796 -449.8056 -2.4137 -2.4538
0.5731 0.8375 3200 0.5864 -1.1928 -1.6493 0.6900 0.4564 -462.0313 -455.5009 -2.3988 -2.4401
0.586 0.8636 3300 0.5865 -1.1740 -1.6231 0.6895 0.4492 -459.4178 -453.6159 -2.3969 -2.4384
0.5629 0.8898 3400 0.5860 -1.1573 -1.6086 0.6890 0.4513 -457.9694 -451.9486 -2.3882 -2.4306
0.6059 0.9160 3500 0.5858 -1.1672 -1.6213 0.6890 0.4541 -459.2307 -452.9388 -2.3897 -2.4320
0.5703 0.9422 3600 0.5860 -1.1607 -1.6138 0.6870 0.4532 -458.4890 -452.2865 -2.3897 -2.4320
0.5533 0.9683 3700 0.5858 -1.1623 -1.6161 0.6880 0.4538 -458.7165 -452.4510 -2.3882 -2.4304
0.5988 0.9945 3800 0.5862 -1.1608 -1.6138 0.6885 0.4530 -458.4823 -452.2973 -2.3882 -2.4306

Framework versions

  • Transformers 4.41.1
  • Pytorch 2.1.2
  • Datasets 2.19.1
  • Tokenizers 0.19.1