llama_DPO_model_e2 / README.md

thorirhrafn

End of training

9bbfa57 verified 2 months ago

preview code

raw

history blame

No virus

7.61 kB

	---
	license: llama2
	library_name: peft
	tags:
	- trl
	- dpo
	- generated_from_trainer
	base_model: meta-llama/Llama-2-7b-hf
	model-index:
	- name: llama_DPO_model_e2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# llama_DPO_model_e2

	This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.1526
	- Rewards/chosen: 0.3611
	- Rewards/rejected: -1.5450
	- Rewards/accuracies: 1.0
	- Rewards/margins: 1.9061
	- Logps/rejected: -200.2592
	- Logps/chosen: -157.0226
	- Logits/rejected: -1.0513
	- Logits/chosen: -0.8571

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-07
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.6819 \| 0.1 \| 25 \| 0.6708 \| 0.0151 \| -0.0312 \| 0.7567 \| 0.0463 \| -185.1220 \| -160.4831 \| -1.0517 \| -0.8540 \|
	\| 0.6351 \| 0.2 \| 50 \| 0.6228 \| 0.0428 \| -0.1054 \| 0.9600 \| 0.1482 \| -185.8636 \| -160.2060 \| -1.0524 \| -0.8552 \|
	\| 0.5874 \| 0.3 \| 75 \| 0.5655 \| 0.0762 \| -0.2019 \| 0.9967 \| 0.2781 \| -186.8286 \| -159.8719 \| -1.0525 \| -0.8548 \|
	\| 0.5179 \| 0.4 \| 100 \| 0.5030 \| 0.1133 \| -0.3207 \| 1.0 \| 0.4340 \| -188.0166 \| -159.5010 \| -1.0521 \| -0.8545 \|
	\| 0.479 \| 0.5 \| 125 \| 0.4468 \| 0.1501 \| -0.4388 \| 1.0 \| 0.5889 \| -189.1974 \| -159.1327 \| -1.0524 \| -0.8554 \|
	\| 0.406 \| 0.6 \| 150 \| 0.3904 \| 0.1842 \| -0.5778 \| 1.0 \| 0.7620 \| -190.5874 \| -158.7915 \| -1.0525 \| -0.8576 \|
	\| 0.3731 \| 0.7 \| 175 \| 0.3377 \| 0.2223 \| -0.7247 \| 1.0 \| 0.9470 \| -192.0564 \| -158.4104 \| -1.0521 \| -0.8559 \|
	\| 0.3075 \| 0.79 \| 200 \| 0.2918 \| 0.2537 \| -0.8769 \| 1.0 \| 1.1305 \| -193.5782 \| -158.0974 \| -1.0525 \| -0.8583 \|
	\| 0.2621 \| 0.89 \| 225 \| 0.2517 \| 0.2822 \| -1.0278 \| 1.0 \| 1.3100 \| -195.0876 \| -157.8119 \| -1.0525 \| -0.8573 \|
	\| 0.2285 \| 0.99 \| 250 \| 0.2180 \| 0.3118 \| -1.1738 \| 1.0 \| 1.4855 \| -196.5471 \| -157.5160 \| -1.0517 \| -0.8568 \|
	\| 0.2162 \| 1.09 \| 275 \| 0.1948 \| 0.3279 \| -1.2897 \| 1.0 \| 1.6176 \| -197.7066 \| -157.3551 \| -1.0513 \| -0.8567 \|
	\| 0.1752 \| 1.19 \| 300 \| 0.1810 \| 0.3383 \| -1.3661 \| 1.0 \| 1.7044 \| -198.4706 \| -157.2514 \| -1.0511 \| -0.8576 \|
	\| 0.1672 \| 1.29 \| 325 \| 0.1714 \| 0.3456 \| -1.4242 \| 1.0 \| 1.7698 \| -199.0516 \| -157.1775 \| -1.0509 \| -0.8568 \|
	\| 0.1722 \| 1.39 \| 350 \| 0.1646 \| 0.3535 \| -1.4653 \| 1.0 \| 1.8187 \| -199.4624 \| -157.0993 \| -1.0510 \| -0.8568 \|
	\| 0.1649 \| 1.49 \| 375 \| 0.1596 \| 0.3586 \| -1.4919 \| 1.0 \| 1.8505 \| -199.7286 \| -157.0477 \| -1.0512 \| -0.8569 \|
	\| 0.1534 \| 1.59 \| 400 \| 0.1580 \| 0.3603 \| -1.5059 \| 1.0 \| 1.8663 \| -199.8687 \| -157.0304 \| -1.0507 \| -0.8571 \|
	\| 0.1492 \| 1.69 \| 425 \| 0.1561 \| 0.3589 \| -1.5194 \| 1.0 \| 1.8783 \| -200.0034 \| -157.0448 \| -1.0514 \| -0.8578 \|
	\| 0.1625 \| 1.79 \| 450 \| 0.1564 \| 0.3586 \| -1.5205 \| 1.0 \| 1.8791 \| -200.0150 \| -157.0482 \| -1.0509 \| -0.8570 \|
	\| 0.1561 \| 1.89 \| 475 \| 0.1535 \| 0.3613 \| -1.5366 \| 1.0 \| 1.8979 \| -200.1756 \| -157.0212 \| -1.0510 \| -0.8576 \|
	\| 0.1565 \| 1.99 \| 500 \| 0.1529 \| 0.3643 \| -1.5393 \| 1.0 \| 1.9036 \| -200.2028 \| -156.9913 \| -1.0513 \| -0.8567 \|
	\| 0.1476 \| 2.09 \| 525 \| 0.1530 \| 0.3640 \| -1.5392 \| 1.0 \| 1.9032 \| -200.2021 \| -156.9944 \| -1.0511 \| -0.8569 \|
	\| 0.1457 \| 2.19 \| 550 \| 0.1530 \| 0.3605 \| -1.5406 \| 1.0 \| 1.9011 \| -200.2155 \| -157.0287 \| -1.0507 \| -0.8577 \|
	\| 0.1376 \| 2.29 \| 575 \| 0.1529 \| 0.3585 \| -1.5466 \| 1.0 \| 1.9051 \| -200.2757 \| -157.0492 \| -1.0508 \| -0.8579 \|
	\| 0.1574 \| 2.38 \| 600 \| 0.1527 \| 0.3634 \| -1.5448 \| 1.0 \| 1.9082 \| -200.2574 \| -156.9998 \| -1.0508 \| -0.8566 \|
	\| 0.1662 \| 2.48 \| 625 \| 0.1518 \| 0.3645 \| -1.5465 \| 1.0 \| 1.9109 \| -200.2742 \| -156.9890 \| -1.0509 \| -0.8572 \|
	\| 0.1535 \| 2.58 \| 650 \| 0.1523 \| 0.3628 \| -1.5458 \| 1.0 \| 1.9086 \| -200.2675 \| -157.0059 \| -1.0510 \| -0.8571 \|
	\| 0.1488 \| 2.68 \| 675 \| 0.1518 \| 0.3658 \| -1.5446 \| 1.0 \| 1.9104 \| -200.2561 \| -156.9763 \| -1.0510 \| -0.8572 \|
	\| 0.1564 \| 2.78 \| 700 \| 0.1526 \| 0.3618 \| -1.5452 \| 1.0 \| 1.9071 \| -200.2618 \| -157.0154 \| -1.0512 \| -0.8568 \|
	\| 0.1367 \| 2.88 \| 725 \| 0.1526 \| 0.3643 \| -1.5426 \| 1.0 \| 1.9069 \| -200.2352 \| -156.9905 \| -1.0513 \| -0.8570 \|
	\| 0.1543 \| 2.98 \| 750 \| 0.1526 \| 0.3611 \| -1.5450 \| 1.0 \| 1.9061 \| -200.2592 \| -157.0226 \| -1.0513 \| -0.8571 \|


	### Framework versions

	- PEFT 0.8.2
	- Transformers 4.38.1
	- Pytorch 2.2.0+cu118
	- Datasets 2.17.1
	- Tokenizers 0.15.2