llama-3.2-3b-dpo-2 / README.md
tanliboy's picture
Model save
daad045 verified
|
raw
history blame
5.25 kB
metadata
library_name: transformers
license: llama3.2
base_model: tanliboy/llama-3.2-3b-sft-2
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: llama-3.2-3b-dpo-2
    results: []

llama-3.2-3b-dpo-2

This model is a fine-tuned version of tanliboy/llama-3.2-3b-sft-2 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5808
  • Rewards/chosen: 1.8125
  • Rewards/rejected: -4.0822
  • Rewards/accuracies: 0.7880
  • Rewards/margins: 5.8947
  • Logps/rejected: -387.3112
  • Logps/chosen: -337.8669
  • Logits/rejected: 0.2355
  • Logits/chosen: 0.1785

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 128
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.7596 0.1741 100 0.7588 0.1349 -1.4398 0.6994 1.5747 -360.8871 -354.6434 0.6135 0.5482
0.6725 0.3483 200 0.6680 0.6247 -2.7323 0.7278 3.3569 -373.8118 -349.7451 0.5335 0.4718
0.6452 0.5224 300 0.6514 0.1770 -3.8036 0.75 3.9807 -384.5256 -354.2216 0.5477 0.4866
0.6259 0.6966 400 0.6328 0.9885 -3.5382 0.7722 4.5267 -381.8713 -346.1070 0.4531 0.3927
0.5709 0.8707 500 0.6219 0.9150 -4.0091 0.7816 4.9242 -386.5804 -346.8415 0.4148 0.3563
0.5835 1.0448 600 0.6094 1.5034 -3.6390 0.7722 5.1423 -382.8790 -340.9584 0.3504 0.2933
0.5571 1.2190 700 0.5992 1.5696 -3.7206 0.7690 5.2901 -383.6949 -340.2962 0.3217 0.2649
0.5532 1.3931 800 0.5954 1.7147 -3.7261 0.7785 5.4408 -383.7506 -338.8453 0.2961 0.2383
0.5168 1.5673 900 0.5930 1.9934 -3.3982 0.7753 5.3916 -380.4709 -336.0577 0.2838 0.2266
0.5232 1.7414 1000 0.5884 1.7308 -4.0024 0.7816 5.7332 -386.5127 -338.6839 0.2787 0.2220
0.5574 1.9155 1100 0.5849 1.8420 -3.9351 0.7911 5.7771 -385.8401 -337.5714 0.2706 0.2134
0.5077 2.0897 1200 0.5842 1.6188 -4.2472 0.7880 5.8659 -388.9607 -339.8043 0.2657 0.2083
0.4952 2.2638 1300 0.5837 1.9316 -3.8913 0.7816 5.8229 -385.4018 -336.6759 0.2694 0.2115
0.5236 2.4380 1400 0.5812 1.8289 -4.0636 0.7880 5.8925 -387.1253 -337.7025 0.2465 0.1895
0.5001 2.6121 1500 0.5814 1.7432 -4.1735 0.7848 5.9167 -388.2242 -338.5596 0.2395 0.1826
0.5246 2.7862 1600 0.5809 1.8622 -4.0120 0.7880 5.8742 -386.6093 -337.3701 0.2395 0.1825
0.5042 2.9604 1700 0.5808 1.8125 -4.0822 0.7880 5.8947 -387.3112 -337.8669 0.2355 0.1785

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1