Visualize in Weights & Biases

qwen2.5-0.5b-expo-DPO-L2EXPO-noES2-0.1

This model is a fine-tuned version of hZzy/qwen2.5-0.5b-sft-news-IFT on the hZzy/train_pairwise_weighted dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7416
  • Logps: -92.3352
  • Logits: -1.2949
  • Objective: 0.7355
  • Dpo Loss: 0.6749
  • Ranking Simple: 0.5502

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 3
  • gradient_accumulation_steps: 12
  • total_train_batch_size: 144
  • total_eval_batch_size: 12
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Logps Logits Objective Dpo Loss Ranking Simple
0.6826 0.1417 50 0.7254 -92.3092 -1.5368 0.7272 0.6828 0.5280
0.6518 0.2834 100 0.7251 -99.4396 -1.6149 0.7209 0.6727 0.5383
0.5964 0.4251 150 0.7330 -90.2166 -1.4409 0.7247 0.6714 0.5409
0.5794 0.5668 200 0.7543 -90.8808 -1.5459 0.7437 0.6858 0.5378
0.5802 0.7085 250 0.7559 -86.8752 -1.5326 0.7459 0.6874 0.5404
0.5473 0.8503 300 0.7457 -92.3480 -1.5370 0.7389 0.6780 0.5487
0.5104 0.9920 350 0.7516 -88.1940 -1.3364 0.7372 0.6766 0.5430
0.4425 1.1337 400 0.7568 -88.7595 -1.2226 0.7489 0.6866 0.5440
0.4544 1.2754 450 0.7455 -90.0551 -1.3089 0.7365 0.6750 0.5481
0.4624 1.4171 500 0.7470 -89.6256 -1.2445 0.7387 0.6782 0.5533
0.4391 1.5588 550 0.7385 -91.9954 -1.1983 0.7304 0.6695 0.5487
0.4285 1.7005 600 0.7408 -91.4037 -1.1181 0.7317 0.6726 0.5502
0.4553 1.8422 650 0.7426 -90.4160 -1.2725 0.7335 0.6740 0.5559
0.4307 1.9839 700 0.7404 -91.7855 -1.2351 0.7342 0.6735 0.5585
0.3755 2.1256 750 0.7430 -93.2394 -1.3013 0.7369 0.6762 0.5487
0.3794 2.2674 800 0.7400 -93.3133 -1.2647 0.7335 0.6726 0.5543
0.373 2.4091 850 0.7410 -92.9388 -1.2593 0.7354 0.6747 0.5523
0.388 2.5508 900 0.7418 -92.8924 -1.2939 0.7363 0.6757 0.5502
0.3866 2.6925 950 0.7418 -92.3290 -1.2937 0.7358 0.6752 0.5507
0.3828 2.8342 1000 0.7417 -92.3260 -1.2946 0.7356 0.6749 0.5502
0.3743 2.9759 1050 0.7416 -92.3352 -1.2949 0.7355 0.6749 0.5502

Framework versions

  • Transformers 4.42.0
  • Pytorch 2.3.0+cu121
  • Datasets 3.2.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
494M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for hZzy/qwen2.5-0.5b-expo-DPO-L2EXPO-noES2-0.1

Finetuned
(74)
this model

Dataset used to train hZzy/qwen2.5-0.5b-expo-DPO-L2EXPO-noES2-0.1