Phi3-mini-DPO-Tuned / README.md

ernestoBocini

End of training

3acb089 verified 5 months ago

preview code

raw

history blame contribute delete

No virus

8.22 kB

	---
	license: mit
	library_name: peft
	tags:
	- trl
	- dpo
	- generated_from_trainer
	base_model: microsoft/Phi-3-mini-4k-instruct
	model-index:
	- name: dpo_with_se
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# dpo_with_se

	This model is a fine-tuned version of [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.6194
	- Rewards/chosen: -0.6699
	- Rewards/rejected: -1.1107
	- Rewards/accuracies: 0.6458
	- Rewards/margins: 0.4407
	- Logps/rejected: -422.9081
	- Logps/chosen: -458.9963
	- Logits/rejected: 0.0509
	- Logits/chosen: 0.1892

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 64
	- total_eval_batch_size: 32
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 2
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.7121 \| 0.0622 \| 50 \| 0.7078 \| 1.9859 \| 1.9118 \| 0.5694 \| 0.0741 \| -392.6837 \| -432.4385 \| 0.1883 \| 0.3317 \|
	\| 0.672 \| 0.1244 \| 100 \| 0.6718 \| 0.4213 \| 0.2008 \| 0.5972 \| 0.2204 \| -409.7933 \| -448.0844 \| 0.1330 \| 0.2722 \|
	\| 0.6803 \| 0.1866 \| 150 \| 0.6633 \| 1.2004 \| 0.9074 \| 0.6215 \| 0.2930 \| -402.7275 \| -440.2932 \| 0.2565 \| 0.3917 \|
	\| 0.6816 \| 0.2488 \| 200 \| 0.6535 \| -0.2285 \| -0.4811 \| 0.5938 \| 0.2526 \| -416.6123 \| -454.5817 \| 0.1335 \| 0.2706 \|
	\| 0.6719 \| 0.3109 \| 250 \| 0.6768 \| -0.0803 \| -0.2830 \| 0.6007 \| 0.2027 \| -414.6320 \| -453.1003 \| 0.1071 \| 0.2455 \|
	\| 0.642 \| 0.3731 \| 300 \| 0.6402 \| 0.3405 \| 0.0226 \| 0.6146 \| 0.3179 \| -411.5756 \| -448.8922 \| 0.0864 \| 0.2271 \|
	\| 0.6675 \| 0.4353 \| 350 \| 0.6472 \| 0.7586 \| 0.4677 \| 0.6007 \| 0.2909 \| -407.1244 \| -444.7109 \| 0.1382 \| 0.2779 \|
	\| 0.6581 \| 0.4975 \| 400 \| 0.6502 \| -0.0310 \| -0.3059 \| 0.6181 \| 0.2749 \| -414.8607 \| -452.6067 \| 0.0326 \| 0.1770 \|
	\| 0.6155 \| 0.5597 \| 450 \| 0.6416 \| 0.0254 \| -0.2895 \| 0.625 \| 0.3149 \| -414.6964 \| -452.0428 \| 0.1102 \| 0.2490 \|
	\| 0.6438 \| 0.6219 \| 500 \| 0.6383 \| -0.2805 \| -0.6002 \| 0.625 \| 0.3197 \| -417.8031 \| -455.1015 \| 0.0799 \| 0.2196 \|
	\| 0.6069 \| 0.6841 \| 550 \| 0.6360 \| -0.6526 \| -0.9456 \| 0.6007 \| 0.2930 \| -421.2573 \| -458.8233 \| 0.1079 \| 0.2462 \|
	\| 0.6227 \| 0.7463 \| 600 \| 0.6349 \| -0.0705 \| -0.3659 \| 0.6215 \| 0.2954 \| -415.4609 \| -453.0020 \| 0.0381 \| 0.1807 \|
	\| 0.6473 \| 0.8085 \| 650 \| 0.6331 \| -0.3187 \| -0.6771 \| 0.6528 \| 0.3584 \| -418.5728 \| -455.4844 \| 0.1406 \| 0.2776 \|
	\| 0.6259 \| 0.8706 \| 700 \| 0.6295 \| -0.4256 \| -0.7399 \| 0.6111 \| 0.3143 \| -419.2006 \| -456.5528 \| 0.0986 \| 0.2391 \|
	\| 0.6572 \| 0.9328 \| 750 \| 0.6389 \| -0.5969 \| -0.8936 \| 0.6007 \| 0.2967 \| -420.7374 \| -458.2657 \| 0.0726 \| 0.2120 \|
	\| 0.63 \| 0.9950 \| 800 \| 0.6310 \| -0.2243 \| -0.5516 \| 0.6285 \| 0.3274 \| -417.3179 \| -454.5398 \| 0.1026 \| 0.2406 \|
	\| 0.4431 \| 1.0572 \| 850 \| 0.6238 \| -0.3325 \| -0.7169 \| 0.6632 \| 0.3844 \| -418.9702 \| -455.6217 \| 0.0604 \| 0.1992 \|
	\| 0.47 \| 1.1194 \| 900 \| 0.6286 \| -0.6589 \| -1.1143 \| 0.6597 \| 0.4554 \| -422.9441 \| -458.8861 \| -0.0269 \| 0.1154 \|
	\| 0.4436 \| 1.1816 \| 950 \| 0.6252 \| -0.6243 \| -1.0270 \| 0.6354 \| 0.4027 \| -422.0717 \| -458.5404 \| 0.0062 \| 0.1465 \|
	\| 0.4483 \| 1.2438 \| 1000 \| 0.6238 \| -0.6325 \| -1.0514 \| 0.6319 \| 0.4189 \| -422.3156 \| -458.6222 \| 0.0434 \| 0.1813 \|
	\| 0.4568 \| 1.3060 \| 1050 \| 0.6297 \| -0.9557 \| -1.3457 \| 0.6285 \| 0.3900 \| -425.2583 \| -461.8539 \| 0.1563 \| 0.2901 \|
	\| 0.4555 \| 1.3682 \| 1100 \| 0.6311 \| -0.5825 \| -1.0012 \| 0.6319 \| 0.4188 \| -421.8140 \| -458.1216 \| 0.0905 \| 0.2271 \|
	\| 0.4744 \| 1.4303 \| 1150 \| 0.6248 \| -0.5365 \| -0.9374 \| 0.6424 \| 0.4008 \| -421.1751 \| -457.6623 \| 0.0472 \| 0.1861 \|
	\| 0.4245 \| 1.4925 \| 1200 \| 0.6255 \| -0.6457 \| -1.0579 \| 0.6424 \| 0.4122 \| -422.3806 \| -458.7540 \| -0.0423 \| 0.0997 \|
	\| 0.4767 \| 1.5547 \| 1250 \| 0.6294 \| -0.7333 \| -1.1519 \| 0.6319 \| 0.4185 \| -423.3202 \| -459.6304 \| 0.1300 \| 0.2652 \|
	\| 0.4714 \| 1.6169 \| 1300 \| 0.6253 \| -0.8128 \| -1.2388 \| 0.6493 \| 0.4261 \| -424.1896 \| -460.4245 \| 0.0397 \| 0.1788 \|
	\| 0.4336 \| 1.6791 \| 1350 \| 0.6229 \| -0.7654 \| -1.2064 \| 0.6424 \| 0.4410 \| -423.8654 \| -459.9506 \| 0.1234 \| 0.2587 \|
	\| 0.4791 \| 1.7413 \| 1400 \| 0.6216 \| -0.7578 \| -1.2069 \| 0.6389 \| 0.4492 \| -423.8710 \| -459.8747 \| 0.0547 \| 0.1931 \|
	\| 0.439 \| 1.8035 \| 1450 \| 0.6204 \| -0.7469 \| -1.1972 \| 0.6493 \| 0.4502 \| -423.7731 \| -459.7664 \| 0.0661 \| 0.2040 \|
	\| 0.4419 \| 1.8657 \| 1500 \| 0.6194 \| -0.6699 \| -1.1107 \| 0.6458 \| 0.4407 \| -422.9081 \| -458.9963 \| 0.0509 \| 0.1892 \|
	\| 0.4593 \| 1.9279 \| 1550 \| 0.6214 \| -0.6895 \| -1.1228 \| 0.6528 \| 0.4333 \| -423.0291 \| -459.1917 \| 0.0628 \| 0.2005 \|
	\| 0.4444 \| 1.9900 \| 1600 \| 0.6229 \| -0.6827 \| -1.1246 \| 0.6667 \| 0.4419 \| -423.0472 \| -459.1237 \| 0.0863 \| 0.2226 \|


	### Framework versions

	- PEFT 0.11.2.dev0
	- Transformers 4.41.2
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.1
	- Tokenizers 0.19.1