llama_DPO_model_e2 / README.md

thorirhrafn

End of training

e2ce8fb verified 2 months ago

preview code

raw

history blame

No virus

5.69 kB

	---
	license: llama2
	library_name: peft
	tags:
	- trl
	- dpo
	- generated_from_trainer
	base_model: meta-llama/Llama-2-7b-hf
	model-index:
	- name: llama_DPO_model_e2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# llama_DPO_model_e2

	This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0896
	- Rewards/chosen: 0.4401
	- Rewards/rejected: -2.0930
	- Rewards/accuracies: 1.0
	- Rewards/margins: 2.5330
	- Logps/rejected: -205.7391
	- Logps/chosen: -156.2334
	- Logits/rejected: -1.0514
	- Logits/chosen: -0.8587

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 8e-07
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 2

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.6699 \| 0.1 \| 25 \| 0.6428 \| 0.0307 \| -0.0744 \| 0.9033 \| 0.1051 \| -185.5532 \| -160.3267 \| -1.0520 \| -0.8550 \|
	\| 0.5702 \| 0.2 \| 50 \| 0.5471 \| 0.0866 \| -0.2359 \| 0.9933 \| 0.3225 \| -187.1690 \| -159.7680 \| -1.0514 \| -0.8544 \|
	\| 0.488 \| 0.3 \| 75 \| 0.4456 \| 0.1502 \| -0.4424 \| 1.0 \| 0.5926 \| -189.2334 \| -159.1314 \| -1.0527 \| -0.8555 \|
	\| 0.3957 \| 0.4 \| 100 \| 0.3600 \| 0.2054 \| -0.6615 \| 1.0 \| 0.8669 \| -191.4245 \| -158.5795 \| -1.0530 \| -0.8577 \|
	\| 0.3338 \| 0.5 \| 125 \| 0.2865 \| 0.2569 \| -0.8933 \| 1.0 \| 1.1502 \| -193.7425 \| -158.0646 \| -1.0524 \| -0.8564 \|
	\| 0.253 \| 0.6 \| 150 \| 0.2257 \| 0.3043 \| -1.1373 \| 1.0 \| 1.4416 \| -196.1830 \| -157.5914 \| -1.0523 \| -0.8570 \|
	\| 0.2134 \| 0.7 \| 175 \| 0.1819 \| 0.3496 \| -1.3537 \| 1.0 \| 1.7033 \| -198.3466 \| -157.1379 \| -1.0530 \| -0.8584 \|
	\| 0.1613 \| 0.79 \| 200 \| 0.1473 \| 0.3842 \| -1.5693 \| 1.0 \| 1.9535 \| -200.5027 \| -156.7917 \| -1.0525 \| -0.8591 \|
	\| 0.1358 \| 0.89 \| 225 \| 0.1231 \| 0.4031 \| -1.7582 \| 1.0 \| 2.1614 \| -202.3919 \| -156.6024 \| -1.0523 \| -0.8593 \|
	\| 0.115 \| 0.99 \| 250 \| 0.1076 \| 0.4205 \| -1.8980 \| 1.0 \| 2.3185 \| -203.7897 \| -156.4292 \| -1.0521 \| -0.8590 \|
	\| 0.1111 \| 1.09 \| 275 \| 0.0989 \| 0.4291 \| -1.9856 \| 1.0 \| 2.4148 \| -204.6660 \| -156.3426 \| -1.0515 \| -0.8591 \|
	\| 0.0902 \| 1.19 \| 300 \| 0.0949 \| 0.4280 \| -2.0337 \| 1.0 \| 2.4617 \| -205.1465 \| -156.3540 \| -1.0507 \| -0.8576 \|
	\| 0.0867 \| 1.29 \| 325 \| 0.0920 \| 0.4325 \| -2.0705 \| 1.0 \| 2.5030 \| -205.5146 \| -156.3087 \| -1.0510 \| -0.8576 \|
	\| 0.0973 \| 1.39 \| 350 \| 0.0905 \| 0.4357 \| -2.0839 \| 1.0 \| 2.5196 \| -205.6485 \| -156.2766 \| -1.0506 \| -0.8576 \|
	\| 0.0942 \| 1.49 \| 375 \| 0.0897 \| 0.4422 \| -2.0838 \| 1.0 \| 2.5260 \| -205.6476 \| -156.2122 \| -1.0515 \| -0.8578 \|
	\| 0.0858 \| 1.59 \| 400 \| 0.0897 \| 0.4392 \| -2.0903 \| 1.0 \| 2.5295 \| -205.7121 \| -156.2415 \| -1.0515 \| -0.8587 \|
	\| 0.083 \| 1.69 \| 425 \| 0.0893 \| 0.4401 \| -2.0972 \| 1.0 \| 2.5373 \| -205.7811 \| -156.2327 \| -1.0511 \| -0.8584 \|
	\| 0.0964 \| 1.79 \| 450 \| 0.0897 \| 0.4368 \| -2.0947 \| 1.0 \| 2.5315 \| -205.7564 \| -156.2662 \| -1.0511 \| -0.8577 \|
	\| 0.0931 \| 1.89 \| 475 \| 0.0890 \| 0.4406 \| -2.0970 \| 1.0 \| 2.5376 \| -205.7794 \| -156.2282 \| -1.0512 \| -0.8585 \|
	\| 0.0915 \| 1.99 \| 500 \| 0.0896 \| 0.4401 \| -2.0930 \| 1.0 \| 2.5330 \| -205.7391 \| -156.2334 \| -1.0514 \| -0.8587 \|


	### Framework versions

	- PEFT 0.8.2
	- Transformers 4.38.1
	- Pytorch 2.2.0+cu118
	- Datasets 2.17.1
	- Tokenizers 0.15.2