Edit model card

Visualize in Weights & Biases

qwen2.5-0.5b-expo-DPO-EXPERIMENT-10-5e6

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise dataset. It achieves the following results on the evaluation set:

  • Loss: 15.2566
  • Logps: -80.3981
  • Logits: -1.0046
  • Objective: 15.1445
  • Dpo Loss: 15.1445
  • Regularize: 15.1445
  • Ranking Simple: 0.5134
  • Ranking Idealized: 0.5093
  • Ranking Idealized Expo: 0.5093

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 6
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 288
  • total_eval_batch_size: 24
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Regularize Ranking Simple Ranking Idealized Ranking Idealized Expo
9.5723 0.2834 50 9.2586 -89.6862 -1.4979 9.6501 9.6501 9.6501 0.5134 0.5093 0.5093
9.8364 0.5668 100 15.5453 -79.4201 -1.3475 15.5409 15.5409 15.5409 0.5176 0.5093 0.5093
8.8451 0.8503 150 16.6626 -82.1459 -1.1122 16.5626 16.5626 16.5626 0.5145 0.5093 0.5093
3.8083 1.1337 200 16.0519 -81.6751 -1.0874 16.3240 16.3240 16.3240 0.5186 0.5093 0.5093
3.6019 1.4171 250 15.8144 -81.5609 -0.9933 15.7679 15.7679 15.7679 0.5176 0.5093 0.5093
2.1682 1.7005 300 15.3824 -80.3329 -1.0036 15.2004 15.2004 15.2004 0.5114 0.5093 0.5093
2.703 1.9839 350 15.2566 -80.3981 -1.0046 15.1445 15.1445 15.1445 0.5134 0.5093 0.5093

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
10
Safetensors
Model size
494M params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-EXPERIMENT-10-5e6

Finetuned
(17)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-EXPERIMENT-10-5e6