llama_SFT_e1_DPO_e2 / README.md

thorirhrafn

End of training

b6bd400 verified 3 months ago

preview code

raw

history blame contribute delete

No virus

5.69 kB

	---
	license: llama2
	library_name: peft
	tags:
	- trl
	- dpo
	- generated_from_trainer
	base_model: meta-llama/Llama-2-7b-hf
	model-index:
	- name: llama_SFT_e1_DPO_e2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# llama_SFT_e1_DPO_e2

	This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.1258
	- Rewards/chosen: 0.3605
	- Rewards/rejected: -1.7770
	- Rewards/accuracies: 1.0
	- Rewards/margins: 2.1375
	- Logps/rejected: -203.4181
	- Logps/chosen: -156.2596
	- Logits/rejected: -1.0532
	- Logits/chosen: -0.8665

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 7e-07
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 2

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.6825 \| 0.1 \| 25 \| 0.6596 \| 0.0243 \| -0.0451 \| 0.8667 \| 0.0694 \| -186.0986 \| -159.6209 \| -1.0534 \| -0.8570 \|
	\| 0.6018 \| 0.2 \| 50 \| 0.5820 \| 0.0671 \| -0.1728 \| 0.9800 \| 0.2399 \| -187.3757 \| -159.1936 \| -1.0531 \| -0.8568 \|
	\| 0.5333 \| 0.3 \| 75 \| 0.5021 \| 0.1133 \| -0.3236 \| 1.0 \| 0.4369 \| -188.8834 \| -158.7311 \| -1.0544 \| -0.8586 \|
	\| 0.4522 \| 0.4 \| 100 \| 0.4213 \| 0.1615 \| -0.5029 \| 1.0 \| 0.6644 \| -190.6768 \| -158.2497 \| -1.0547 \| -0.8596 \|
	\| 0.3962 \| 0.5 \| 125 \| 0.3555 \| 0.1988 \| -0.6844 \| 1.0 \| 0.8832 \| -192.4913 \| -157.8759 \| -1.0548 \| -0.8608 \|
	\| 0.3164 \| 0.6 \| 150 \| 0.2920 \| 0.2416 \| -0.8872 \| 1.0 \| 1.1288 \| -194.5195 \| -157.4483 \| -1.0550 \| -0.8660 \|
	\| 0.2673 \| 0.7 \| 175 \| 0.2400 \| 0.2789 \| -1.0936 \| 1.0 \| 1.3725 \| -196.5838 \| -157.0758 \| -1.0540 \| -0.8656 \|
	\| 0.217 \| 0.79 \| 200 \| 0.2008 \| 0.3028 \| -1.2873 \| 1.0 \| 1.5900 \| -198.5201 \| -156.8367 \| -1.0540 \| -0.8668 \|
	\| 0.1822 \| 0.89 \| 225 \| 0.1694 \| 0.3294 \| -1.4600 \| 1.0 \| 1.7894 \| -200.2475 \| -156.5703 \| -1.0541 \| -0.8674 \|
	\| 0.1578 \| 0.99 \| 250 \| 0.1483 \| 0.3436 \| -1.6056 \| 1.0 \| 1.9492 \| -201.7036 \| -156.4280 \| -1.0538 \| -0.8668 \|
	\| 0.1509 \| 1.09 \| 275 \| 0.1364 \| 0.3512 \| -1.6903 \| 1.0 \| 2.0414 \| -202.5503 \| -156.3527 \| -1.0534 \| -0.8666 \|
	\| 0.1273 \| 1.19 \| 300 \| 0.1322 \| 0.3561 \| -1.7242 \| 1.0 \| 2.0804 \| -202.8900 \| -156.3031 \| -1.0532 \| -0.8657 \|
	\| 0.1208 \| 1.29 \| 325 \| 0.1284 \| 0.3561 \| -1.7546 \| 1.0 \| 2.1106 \| -203.1934 \| -156.3038 \| -1.0534 \| -0.8668 \|
	\| 0.1325 \| 1.39 \| 350 \| 0.1270 \| 0.3598 \| -1.7654 \| 1.0 \| 2.1252 \| -203.3020 \| -156.2663 \| -1.0532 \| -0.8665 \|
	\| 0.1287 \| 1.49 \| 375 \| 0.1263 \| 0.3618 \| -1.7718 \| 1.0 \| 2.1336 \| -203.3654 \| -156.2462 \| -1.0534 \| -0.8666 \|
	\| 0.1203 \| 1.59 \| 400 \| 0.1252 \| 0.3624 \| -1.7783 \| 1.0 \| 2.1407 \| -203.4305 \| -156.2402 \| -1.0532 \| -0.8666 \|
	\| 0.1188 \| 1.69 \| 425 \| 0.1254 \| 0.3610 \| -1.7767 \| 1.0 \| 2.1377 \| -203.4145 \| -156.2542 \| -1.0530 \| -0.8664 \|
	\| 0.1331 \| 1.79 \| 450 \| 0.1253 \| 0.3640 \| -1.7760 \| 1.0 \| 2.1400 \| -203.4073 \| -156.2242 \| -1.0531 \| -0.8662 \|
	\| 0.1301 \| 1.89 \| 475 \| 0.1252 \| 0.3641 \| -1.7772 \| 1.0 \| 2.1413 \| -203.4194 \| -156.2230 \| -1.0531 \| -0.8667 \|
	\| 0.1289 \| 1.99 \| 500 \| 0.1258 \| 0.3605 \| -1.7770 \| 1.0 \| 2.1375 \| -203.4181 \| -156.2596 \| -1.0532 \| -0.8665 \|


	### Framework versions

	- PEFT 0.8.2
	- Transformers 4.38.1
	- Pytorch 2.2.0+cu118
	- Datasets 2.17.1
	- Tokenizers 0.15.2