hZzy's picture
End of training
9f13060 verified
metadata
license: apache-2.0
base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
tags:
  - alignment-handbook
  - ndcg
  - trl
  - expo
  - generated_from_trainer
  - trl
  - expo
  - generated_from_trainer
datasets:
  - hZzy/train_pairwise
model-index:
  - name: qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-0.5-5e6
    results: []

Visualize in Weights & Biases

qwen2.5-0.5b-expo-L2EXPO-EXPERIMENT-0.5-5e6

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

  • Loss: 2.2556
  • Logps: -80.0690
  • Logits: -0.6172
  • Objective: 2.2419
  • Dpo Loss: 1.3282
  • Regularize: 2.2419
  • Ranking Simple: 0.5134
  • Ranking Idealized: 0.5248
  • Ranking Idealized Expo: 0.5093

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 6
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 288
  • total_eval_batch_size: 24
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple Ranking Idealized Ranking Idealized Expo
0.8081 0.2834 50 0.6652 -91.9364 -1.3308 0.6722 0.7266 0.6722 0.5124 0.5248 0.5093
1.4482 0.5668 100 1.4160 -83.3251 -1.0880 1.3662 0.9745 1.3662 0.5093 0.5248 0.5093
1.5063 0.8503 150 1.8403 -79.4245 -0.9764 1.8307 1.1388 1.8307 0.5155 0.5248 0.5093
1.3427 1.1337 200 1.9411 -78.0898 -0.8446 1.9042 1.1943 1.9042 0.5124 0.5248 0.5093
1.2385 1.4171 250 2.1004 -81.0783 -0.8252 2.0780 1.2812 2.0780 0.5072 0.5248 0.5093
1.1013 1.7005 300 2.1954 -78.5161 -0.6190 2.2003 1.3091 2.2003 0.5124 0.5248 0.5093
0.9795 1.9839 350 2.2001 -78.2914 -0.6908 2.1850 1.2866 2.1850 0.5093 0.5248 0.5093
0.8853 2.2674 400 2.2679 -78.5732 -0.6216 2.2619 1.3223 2.2619 0.5134 0.5248 0.5093
0.7605 2.5508 450 2.2655 -78.2840 -0.6826 2.2744 1.3572 2.2744 0.5145 0.5248 0.5093
0.6709 2.8342 500 2.2688 -79.7185 -0.6486 2.2578 1.3375 2.2578 0.5186 0.5248 0.5093
0.5302 3.1176 550 2.2598 -80.1419 -0.6267 2.2430 1.3210 2.2430 0.5196 0.5248 0.5093
0.4552 3.4010 600 2.2547 -79.9582 -0.6007 2.2379 1.3298 2.2379 0.5124 0.5248 0.5093
0.3981 3.6845 650 2.2549 -80.1880 -0.5995 2.2397 1.3238 2.2397 0.5155 0.5248 0.5093
0.3178 3.9679 700 2.2616 -80.4560 -0.6215 2.2539 1.3332 2.2539 0.5134 0.5248 0.5093
0.2213 4.2513 750 2.2620 -80.1501 -0.6154 2.2499 1.3297 2.2499 0.5134 0.5248 0.5093
0.2032 4.5347 800 2.2583 -80.1241 -0.6175 2.2455 1.3295 2.2455 0.5134 0.5248 0.5093
0.1935 4.8181 850 2.2561 -80.0661 -0.6169 2.2424 1.3284 2.2424 0.5134 0.5248 0.5093

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1