llama_DPO_model_e2 / README.md

thorirhrafn

End of training

fed5015 verified about 2 months ago

preview code

raw

history blame contribute delete

No virus

5.69 kB

	---
	license: llama2
	library_name: peft
	tags:
	- trl
	- dpo
	- generated_from_trainer
	base_model: meta-llama/Llama-2-7b-hf
	model-index:
	- name: llama_DPO_model_e2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# llama_DPO_model_e2

	This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.1045
	- Rewards/chosen: 0.4197
	- Rewards/rejected: -1.9316
	- Rewards/accuracies: 1.0
	- Rewards/margins: 2.3513
	- Logps/rejected: -204.1257
	- Logps/chosen: -156.4368
	- Logits/rejected: -1.0515
	- Logits/chosen: -0.8584

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 7.5e-07
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 2

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.6732 \| 0.1 \| 25 \| 0.6518 \| 0.0274 \| -0.0584 \| 0.8867 \| 0.0858 \| -185.3935 \| -160.3602 \| -1.0521 \| -0.8541 \|
	\| 0.588 \| 0.2 \| 50 \| 0.5616 \| 0.0780 \| -0.2093 \| 0.9933 \| 0.2873 \| -186.9026 \| -159.8541 \| -1.0523 \| -0.8550 \|
	\| 0.5077 \| 0.3 \| 75 \| 0.4690 \| 0.1360 \| -0.3896 \| 1.0 \| 0.5256 \| -188.7056 \| -159.2737 \| -1.0525 \| -0.8564 \|
	\| 0.4179 \| 0.4 \| 100 \| 0.3872 \| 0.1873 \| -0.5861 \| 1.0 \| 0.7734 \| -190.6710 \| -158.7608 \| -1.0532 \| -0.8563 \|
	\| 0.3614 \| 0.5 \| 125 \| 0.3170 \| 0.2381 \| -0.7895 \| 1.0 \| 1.0276 \| -192.7043 \| -158.2528 \| -1.0533 \| -0.8568 \|
	\| 0.2812 \| 0.6 \| 150 \| 0.2544 \| 0.2856 \| -1.0121 \| 1.0 \| 1.2977 \| -194.9309 \| -157.7783 \| -1.0527 \| -0.8569 \|
	\| 0.2378 \| 0.7 \| 175 \| 0.2066 \| 0.3262 \| -1.2240 \| 1.0 \| 1.5502 \| -197.0494 \| -157.3717 \| -1.0520 \| -0.8573 \|
	\| 0.1866 \| 0.79 \| 200 \| 0.1704 \| 0.3591 \| -1.4222 \| 1.0 \| 1.7812 \| -199.0312 \| -157.0431 \| -1.0526 \| -0.8577 \|
	\| 0.1555 \| 0.89 \| 225 \| 0.1429 \| 0.3829 \| -1.6050 \| 1.0 \| 1.9879 \| -200.8594 \| -156.8051 \| -1.0523 \| -0.8580 \|
	\| 0.1312 \| 0.99 \| 250 \| 0.1239 \| 0.4002 \| -1.7534 \| 1.0 \| 2.1536 \| -202.3439 \| -156.6322 \| -1.0515 \| -0.8572 \|
	\| 0.1276 \| 1.09 \| 275 \| 0.1147 \| 0.4086 \| -1.8325 \| 1.0 \| 2.2410 \| -203.1341 \| -156.5480 \| -1.0518 \| -0.8578 \|
	\| 0.1038 \| 1.19 \| 300 \| 0.1094 \| 0.4144 \| -1.8779 \| 1.0 \| 2.2923 \| -203.5883 \| -156.4901 \| -1.0511 \| -0.8574 \|
	\| 0.101 \| 1.29 \| 325 \| 0.1072 \| 0.4191 \| -1.9023 \| 1.0 \| 2.3214 \| -203.8326 \| -156.4429 \| -1.0512 \| -0.8569 \|
	\| 0.1128 \| 1.39 \| 350 \| 0.1056 \| 0.4189 \| -1.9206 \| 1.0 \| 2.3394 \| -204.0154 \| -156.4454 \| -1.0511 \| -0.8576 \|
	\| 0.11 \| 1.49 \| 375 \| 0.1047 \| 0.4220 \| -1.9262 \| 1.0 \| 2.3482 \| -204.0712 \| -156.4135 \| -1.0509 \| -0.8570 \|
	\| 0.1001 \| 1.59 \| 400 \| 0.1048 \| 0.4224 \| -1.9281 \| 1.0 \| 2.3505 \| -204.0909 \| -156.4098 \| -1.0514 \| -0.8574 \|
	\| 0.0978 \| 1.69 \| 425 \| 0.1042 \| 0.4246 \| -1.9292 \| 1.0 \| 2.3538 \| -204.1014 \| -156.3875 \| -1.0512 \| -0.8573 \|
	\| 0.1111 \| 1.79 \| 450 \| 0.1041 \| 0.4244 \| -1.9292 \| 1.0 \| 2.3536 \| -204.1017 \| -156.3903 \| -1.0514 \| -0.8587 \|
	\| 0.1064 \| 1.89 \| 475 \| 0.1044 \| 0.4199 \| -1.9317 \| 1.0 \| 2.3516 \| -204.1266 \| -156.4352 \| -1.0514 \| -0.8577 \|
	\| 0.107 \| 1.99 \| 500 \| 0.1045 \| 0.4197 \| -1.9316 \| 1.0 \| 2.3513 \| -204.1257 \| -156.4368 \| -1.0515 \| -0.8584 \|


	### Framework versions

	- PEFT 0.8.2
	- Transformers 4.38.1
	- Pytorch 2.2.0+cu118
	- Datasets 2.17.1
	- Tokenizers 0.15.2