qwen2.5-0.5b-expo-DPO-ES-TRY / README.md

End of training

f1d9b8b verified about 1 month ago

4.52 kB

	---
	license: apache-2.0
	base_model: hZzy/qwen2.5-0.5b-sft-news-IFT
	tags:
	- alignment-handbook
	- ndcg
	- trl
	- expo
	- generated_from_trainer
	- trl
	- expo
	- generated_from_trainer
	datasets:
	- hZzy/train_pairwise
	model-index:
	- name: qwen2.5-0.5b-expo-DPO-ES-TRY
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/zhiyuzha-university-of-florida/huggingface/runs/gcfd4lf7)
	# qwen2.5-0.5b-expo-DPO-ES-TRY

	This model is a fine-tuned version of [hZzy/qwen2.5-0.5b-sft-news-IFT](https://huggingface.co/hZzy/qwen2.5-0.5b-sft-news-IFT) on the hZzy/train_pairwise dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.6811
	- Logps: -89.5089
	- Logits: -2.2697
	- Objective: 0.6619
	- Dpo Loss: 0.6619
	- Regularize: 0.6619
	- Ranking Simple: 0.5735
	- Ranking Idealized: 0.6046
	- Ranking Idealized Expo: 0.5280
	- Dpo Wo Beta: -2.3796

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-06
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 6
	- gradient_accumulation_steps: 6
	- total_train_batch_size: 72
	- total_eval_batch_size: 12
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Dpo Loss \| Dpo Wo Beta \| Logits \| Logps \| Validation Loss \| Objective \| Ranking Idealized \| Ranking Idealized Expo \| Ranking Simple \| Regularize \|
	\|:-------------:\|:------:\|:----:\|:--------:\|:-----------:\|:-------:\|:---------:\|:---------------:\|:---------:\|:-----------------:\|:----------------------:\|:--------------:\|:----------:\|
	\| 0.6857 \| 0.0709 \| 50 \| 0.6927 \| -1.2807 \| -1.9606 \| -88.9841 \| 0.6914 \| 0.6927 \| 0.6046 \| 0.5280 \| 0.5362 \| 0.6927 \|
	\| 0.6524 \| 0.1417 \| 100 \| 0.7010 \| -1.8911 \| -2.0579 \| -98.6358 \| 0.6922 \| 0.7010 \| 0.6046 \| 0.5280 \| 0.5269 \| 0.7010 \|
	\| 0.6123 \| 0.2126 \| 150 \| 0.7015 \| -2.1166 \| -1.9033 \| -102.8927 \| 0.6967 \| 0.7015 \| 0.6046 \| 0.5280 \| 0.5280 \| 0.7015 \|
	\| 0.5779 \| 0.2834 \| 200 \| 0.6816 \| -2.1417 \| -2.0716 \| -106.4944 \| 0.6794 \| 0.6816 \| 0.6046 \| 0.5280 \| 0.5507 \| 0.6816 \|
	\| 0.5709 \| 0.3543 \| 250 \| 0.6817 \| -2.2676 \| -2.2470 \| -87.7326 \| 0.6883 \| 0.6817 \| 0.6046 \| 0.5280 \| 0.5424 \| 0.6817 \|
	\| 0.5563 \| 0.4251 \| 300 \| 0.6619 \| -2.3796 \| -2.2697 \| -89.5089 \| 0.6811 \| 0.6619 \| 0.6046 \| 0.5280 \| 0.5735 \| 0.6619 \|
	\| 0.5321 \| 0.4960 \| 350 \| 0.6773 \| -2.6295 \| -2.3683 \| -99.0927 \| 0.6926 \| 0.6773 \| 0.6046 \| 0.5280 \| 0.5735 \| 0.6773 \|
	\| 0.4963 \| 0.5668 \| 400 \| 0.6836 \| -2.6913 \| -2.2508 \| -106.7073 \| 0.6914 \| 0.6836 \| 0.6046 \| 0.5280 \| 0.5673 \| 0.6836 \|
	\| 0.4745 \| 0.6377 \| 450 \| 0.6938 \| -105.8669 \| -2.2347 \| 0.6815 \| 0.6815 \| 0.6815 \| 0.5631 \| 0.6046 \| 0.5280 \| -2.6738 \|
	\| 0.4867 \| 0.7085 \| 500 \| 0.7040 \| -105.1848 \| -2.2182 \| 0.6995 \| 0.6995 \| 0.6995 \| 0.5507 \| 0.6046 \| 0.5280 \| -2.7257 \|
	\| 0.4582 \| 0.7794 \| 550 \| 0.6995 \| -102.6643 \| -2.3855 \| 0.7027 \| 0.7027 \| 0.7027 \| 0.5683 \| 0.6046 \| 0.5280 \| -3.1023 \|
	\| 0.4339 \| 0.8503 \| 600 \| 0.6965 \| -103.5456 \| -2.4456 \| 0.7050 \| 0.7050 \| 0.7050 \| 0.5735 \| 0.6046 \| 0.5280 \| -3.2166 \|


	### Framework versions

	- Transformers 4.42.0
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.1
	- Tokenizers 0.19.1