llama_DPO_model_e2 / README.md

thorirhrafn

End of training

56455bc verified about 2 months ago

preview code

raw

history blame

No virus

5.69 kB

	---
	license: llama2
	library_name: peft
	tags:
	- trl
	- dpo
	- generated_from_trainer
	base_model: meta-llama/Llama-2-7b-hf
	model-index:
	- name: llama_DPO_model_e2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# llama_DPO_model_e2

	This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.1205
	- Rewards/chosen: 0.4005
	- Rewards/rejected: -1.7841
	- Rewards/accuracies: 1.0
	- Rewards/margins: 2.1847
	- Logps/rejected: -202.6509
	- Logps/chosen: -156.6288
	- Logits/rejected: -1.0515
	- Logits/chosen: -0.8581

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 7e-07
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 2

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.6753 \| 0.1 \| 25 \| 0.6561 \| 0.0241 \| -0.0529 \| 0.8800 \| 0.0770 \| -185.3385 \| -160.3932 \| -1.0518 \| -0.8547 \|
	\| 0.596 \| 0.2 \| 50 \| 0.5763 \| 0.0663 \| -0.1863 \| 0.9933 \| 0.2525 \| -186.6722 \| -159.9714 \| -1.0527 \| -0.8563 \|
	\| 0.5265 \| 0.3 \| 75 \| 0.4888 \| 0.1230 \| -0.3480 \| 1.0 \| 0.4710 \| -188.2895 \| -159.4043 \| -1.0529 \| -0.8557 \|
	\| 0.4405 \| 0.4 \| 100 \| 0.4115 \| 0.1711 \| -0.5248 \| 1.0 \| 0.6959 \| -190.0574 \| -158.9227 \| -1.0521 \| -0.8557 \|
	\| 0.3832 \| 0.5 \| 125 \| 0.3418 \| 0.2187 \| -0.7108 \| 1.0 \| 0.9295 \| -191.9176 \| -158.4473 \| -1.0530 \| -0.8571 \|
	\| 0.3071 \| 0.6 \| 150 \| 0.2809 \| 0.2614 \| -0.9143 \| 1.0 \| 1.1757 \| -193.9524 \| -158.0195 \| -1.0526 \| -0.8568 \|
	\| 0.2635 \| 0.7 \| 175 \| 0.2300 \| 0.3051 \| -1.1158 \| 1.0 \| 1.4209 \| -195.9679 \| -157.5830 \| -1.0531 \| -0.8575 \|
	\| 0.2056 \| 0.79 \| 200 \| 0.1912 \| 0.3381 \| -1.3041 \| 1.0 \| 1.6422 \| -197.8509 \| -157.2532 \| -1.0529 \| -0.8577 \|
	\| 0.1735 \| 0.89 \| 225 \| 0.1617 \| 0.3637 \| -1.4760 \| 1.0 \| 1.8397 \| -199.5699 \| -156.9968 \| -1.0524 \| -0.8580 \|
	\| 0.1492 \| 0.99 \| 250 \| 0.1416 \| 0.3797 \| -1.6179 \| 1.0 \| 1.9976 \| -200.9889 \| -156.8374 \| -1.0521 \| -0.8575 \|
	\| 0.144 \| 1.09 \| 275 \| 0.1304 \| 0.3918 \| -1.6997 \| 1.0 \| 2.0915 \| -201.8062 \| -156.7157 \| -1.0517 \| -0.8590 \|
	\| 0.1203 \| 1.19 \| 300 \| 0.1255 \| 0.3955 \| -1.7398 \| 1.0 \| 2.1353 \| -202.2080 \| -156.6790 \| -1.0514 \| -0.8580 \|
	\| 0.117 \| 1.29 \| 325 \| 0.1229 \| 0.3961 \| -1.7635 \| 1.0 \| 2.1596 \| -202.4451 \| -156.6730 \| -1.0514 \| -0.8572 \|
	\| 0.1286 \| 1.39 \| 350 \| 0.1209 \| 0.4018 \| -1.7766 \| 1.0 \| 2.1784 \| -202.5752 \| -156.6156 \| -1.0517 \| -0.8587 \|
	\| 0.126 \| 1.49 \| 375 \| 0.1199 \| 0.4025 \| -1.7866 \| 1.0 \| 2.1891 \| -202.6759 \| -156.6091 \| -1.0517 \| -0.8587 \|
	\| 0.1154 \| 1.59 \| 400 \| 0.1202 \| 0.4013 \| -1.7865 \| 1.0 \| 2.1877 \| -202.6743 \| -156.6213 \| -1.0514 \| -0.8580 \|
	\| 0.1141 \| 1.69 \| 425 \| 0.1200 \| 0.3990 \| -1.7907 \| 1.0 \| 2.1897 \| -202.7168 \| -156.6437 \| -1.0518 \| -0.8578 \|
	\| 0.1284 \| 1.79 \| 450 \| 0.1196 \| 0.4012 \| -1.7899 \| 1.0 \| 2.1910 \| -202.7081 \| -156.6221 \| -1.0518 \| -0.8582 \|
	\| 0.1225 \| 1.89 \| 475 \| 0.1205 \| 0.3984 \| -1.7858 \| 1.0 \| 2.1842 \| -202.6674 \| -156.6495 \| -1.0517 \| -0.8592 \|
	\| 0.1224 \| 1.99 \| 500 \| 0.1205 \| 0.4005 \| -1.7841 \| 1.0 \| 2.1847 \| -202.6509 \| -156.6288 \| -1.0515 \| -0.8581 \|


	### Framework versions

	- PEFT 0.8.2
	- Transformers 4.38.1
	- Pytorch 2.2.0+cu118
	- Datasets 2.17.1
	- Tokenizers 0.15.2