qwen_orpo_entropy / README.md
yakazimir's picture
End of training
fd1f1e6 verified
|
raw
history blame
5.08 kB
metadata
library_name: transformers
license: other
base_model: trl-lib/qwen1.5-0.5b-sft
tags:
  - alignment-handbook
  - trl
  - simpo
  - generated_from_trainer
  - trl
  - simpo
  - generated_from_trainer
datasets:
  - yakazimir/ultrafeedback_binarized
model-index:
  - name: qwen_orpo_entropy
    results: []

qwen_orpo_entropy

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5245
  • Rewards/chosen: -9.9757
  • Rewards/rejected: -11.1054
  • Rewards/accuracies: 0.7240
  • Rewards/margins: 1.1296
  • Logps/rejected: -11.1054
  • Logps/chosen: -9.9757
  • Logits/rejected: 0.9577
  • Logits/chosen: 0.9163
  • Semantic Entropy: 0.0013

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Semantic Entropy
1.0119 0.2141 400 1.0132 -1.7930 -2.0280 0.5660 0.2350 -2.0280 -1.7930 0.3600 0.2738 0.6132
0.5924 0.4282 800 0.5851 -6.7409 -7.2987 0.6810 0.5578 -7.2987 -6.7409 0.5386 0.4884 0.0131
0.5951 0.6422 1200 0.5522 -7.9883 -8.6813 0.7062 0.6931 -8.6813 -7.9883 0.7969 0.7507 0.0050
0.4796 0.8563 1600 0.5406 -8.4790 -9.1974 0.7047 0.7184 -9.1974 -8.4790 0.9158 0.8517 0.0035
0.5834 1.0704 2000 0.5344 -8.7256 -9.5131 0.7159 0.7875 -9.5131 -8.7256 0.8620 0.7784 0.0027
0.5261 1.2845 2400 0.5313 -8.7103 -9.6511 0.7136 0.9408 -9.6511 -8.7103 0.8723 0.8012 0.0029
0.4879 1.4986 2800 0.5264 -8.6267 -9.5330 0.7218 0.9063 -9.5330 -8.6267 0.7496 0.6896 0.0033
0.5524 1.7127 3200 0.5207 -8.8757 -9.8346 0.7166 0.9589 -9.8346 -8.8757 0.9052 0.8485 0.0030
0.5311 1.9267 3600 0.5170 -9.0983 -10.0747 0.7233 0.9765 -10.0747 -9.0983 0.8342 0.7884 0.0024
0.3953 2.1408 4000 0.5261 -9.8407 -10.9409 0.7196 1.1002 -10.9409 -9.8407 0.9782 0.9286 0.0015
0.428 2.3549 4400 0.5250 -9.9515 -11.0890 0.7211 1.1375 -11.0890 -9.9515 0.9721 0.9215 0.0013
0.4394 2.5690 4800 0.5238 -9.8173 -10.9421 0.7255 1.1248 -10.9421 -9.8173 0.8956 0.8550 0.0014
0.4221 2.7831 5200 0.5239 -9.9581 -11.0861 0.7248 1.1280 -11.0861 -9.9581 0.9048 0.8672 0.0013
0.4023 2.9972 5600 0.5245 -9.9757 -11.1054 0.7240 1.1296 -11.1054 -9.9757 0.9577 0.9163 0.0013

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1