qwen_fUNL_entropy / README.md
yakazimir's picture
End of training
038dc1b verified
metadata
library_name: transformers
license: other
base_model: trl-lib/qwen1.5-0.5b-sft
tags:
  - alignment-handbook
  - trl
  - simpo
  - generated_from_trainer
  - trl
  - simpo
  - generated_from_trainer
datasets:
  - yakazimir/ultrafeedback_binarized
model-index:
  - name: qwen_fUNL_entropy
    results: []

qwen_fUNL_entropy

This model is a fine-tuned version of trl-lib/qwen1.5-0.5b-sft on the yakazimir/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0000
  • Rewards/chosen: -42.7794
  • Rewards/rejected: -43.9149
  • Rewards/accuracies: 0.5668
  • Rewards/margins: 1.1356
  • Logps/rejected: -43.9149
  • Logps/chosen: -42.7794
  • Logits/rejected: 7.2567
  • Logits/chosen: 7.5393

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0 0.2141 400 0.0001 -29.4740 -31.2238 0.5690 1.7498 -31.2238 -29.4740 4.5258 4.5083
0.0 0.4282 800 0.0000 -37.0465 -38.5277 0.5579 1.4811 -38.5277 -37.0465 6.2307 6.3643
0.001 0.6422 1200 0.0000 -38.7942 -40.1267 0.5668 1.3324 -40.1267 -38.7942 6.5149 6.7000
0.0 0.8563 1600 0.0000 -38.5913 -40.0107 0.5668 1.4194 -40.0107 -38.5913 6.5708 6.7471
0.0 1.0704 2000 0.0000 -40.7799 -42.0174 0.5675 1.2374 -42.0174 -40.7799 7.0075 7.2451
0.0 1.2845 2400 0.0000 -40.9809 -42.2090 0.5645 1.2280 -42.2090 -40.9809 6.9425 7.1883
0.0 1.4986 2800 0.0000 -41.7185 -42.9016 0.5631 1.1831 -42.9016 -41.7185 7.2071 7.4629
0.0 1.7127 3200 0.0000 -41.7373 -42.9487 0.5675 1.2115 -42.9487 -41.7373 7.0907 7.3464
0.0 1.9267 3600 0.0000 -42.3165 -43.4863 0.5668 1.1698 -43.4863 -42.3165 7.2080 7.4815
0.0 2.1408 4000 0.0000 -43.0385 -44.1473 0.5697 1.1088 -44.1473 -43.0385 7.2552 7.5548
0.0 2.3549 4400 0.0000 -42.9448 -44.0525 0.5705 1.1077 -44.0525 -42.9448 7.2918 7.5836
0.0 2.5690 4800 0.0000 -43.0768 -44.1767 0.5675 1.0999 -44.1767 -43.0768 7.3794 7.6690
0.0 2.7831 5200 0.0000 -43.1227 -44.2291 0.5690 1.1064 -44.2291 -43.1227 7.2960 7.5933
0.0 2.9972 5600 0.0000 -42.7794 -43.9149 0.5668 1.1356 -43.9149 -42.7794 7.2567 7.5393

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1