llama_DPO_model_e2 / README.md

thorirhrafn

End of training

10094a9 verified 4 months ago

preview code

raw

history blame

No virus

5.69 kB

	---
	license: llama2
	library_name: peft
	tags:
	- trl
	- dpo
	- generated_from_trainer
	base_model: meta-llama/Llama-2-7b-hf
	model-index:
	- name: llama_DPO_model_e2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# llama_DPO_model_e2

	This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0937
	- Rewards/chosen: 0.4389
	- Rewards/rejected: -2.0384
	- Rewards/accuracies: 1.0
	- Rewards/margins: 2.4774
	- Logps/rejected: -205.1940
	- Logps/chosen: -156.2447
	- Logits/rejected: -1.0509
	- Logits/chosen: -0.8587

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 8e-07
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 2

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.673 \| 0.1 \| 25 \| 0.6445 \| 0.0273 \| -0.0740 \| 0.9000 \| 0.1013 \| -185.5491 \| -160.3607 \| -1.0521 \| -0.8545 \|
	\| 0.5737 \| 0.2 \| 50 \| 0.5485 \| 0.0856 \| -0.2335 \| 0.9933 \| 0.3190 \| -187.1442 \| -159.7781 \| -1.0526 \| -0.8551 \|
	\| 0.4843 \| 0.3 \| 75 \| 0.4496 \| 0.1470 \| -0.4343 \| 1.0 \| 0.5814 \| -189.1528 \| -159.1637 \| -1.0527 \| -0.8571 \|
	\| 0.4006 \| 0.4 \| 100 \| 0.3655 \| 0.2043 \| -0.6419 \| 1.0 \| 0.8462 \| -191.2286 \| -158.5909 \| -1.0521 \| -0.8556 \|
	\| 0.3417 \| 0.5 \| 125 \| 0.2945 \| 0.2551 \| -0.8630 \| 1.0 \| 1.1180 \| -193.4393 \| -158.0833 \| -1.0522 \| -0.8562 \|
	\| 0.2601 \| 0.6 \| 150 \| 0.2353 \| 0.3032 \| -1.0903 \| 1.0 \| 1.3935 \| -195.7128 \| -157.6020 \| -1.0520 \| -0.8597 \|
	\| 0.2197 \| 0.7 \| 175 \| 0.1891 \| 0.3442 \| -1.3124 \| 1.0 \| 1.6565 \| -197.9333 \| -157.1923 \| -1.0522 \| -0.8579 \|
	\| 0.1675 \| 0.79 \| 200 \| 0.1532 \| 0.3815 \| -1.5253 \| 1.0 \| 1.9067 \| -200.0621 \| -156.8192 \| -1.0526 \| -0.8582 \|
	\| 0.1417 \| 0.89 \| 225 \| 0.1289 \| 0.4011 \| -1.7082 \| 1.0 \| 2.1094 \| -201.8920 \| -156.6225 \| -1.0525 \| -0.8585 \|
	\| 0.1203 \| 0.99 \| 250 \| 0.1117 \| 0.4214 \| -1.8534 \| 1.0 \| 2.2748 \| -203.3437 \| -156.4196 \| -1.0517 \| -0.8603 \|
	\| 0.1156 \| 1.09 \| 275 \| 0.1034 \| 0.4296 \| -1.9336 \| 1.0 \| 2.3633 \| -204.1459 \| -156.3377 \| -1.0517 \| -0.8590 \|
	\| 0.0942 \| 1.19 \| 300 \| 0.0990 \| 0.4310 \| -1.9823 \| 1.0 \| 2.4133 \| -204.6330 \| -156.3240 \| -1.0514 \| -0.8577 \|
	\| 0.0903 \| 1.29 \| 325 \| 0.0957 \| 0.4380 \| -2.0137 \| 1.0 \| 2.4517 \| -204.9467 \| -156.2539 \| -1.0511 \| -0.8593 \|
	\| 0.1023 \| 1.39 \| 350 \| 0.0946 \| 0.4384 \| -2.0296 \| 1.0 \| 2.4680 \| -205.1059 \| -156.2503 \| -1.0519 \| -0.8587 \|
	\| 0.0984 \| 1.49 \| 375 \| 0.0945 \| 0.4352 \| -2.0350 \| 1.0 \| 2.4702 \| -205.1597 \| -156.2819 \| -1.0510 \| -0.8580 \|
	\| 0.0899 \| 1.59 \| 400 \| 0.0939 \| 0.4360 \| -2.0393 \| 1.0 \| 2.4752 \| -205.2024 \| -156.2742 \| -1.0513 \| -0.8594 \|
	\| 0.0883 \| 1.69 \| 425 \| 0.0939 \| 0.4374 \| -2.0378 \| 1.0 \| 2.4752 \| -205.1877 \| -156.2598 \| -1.0514 \| -0.8590 \|
	\| 0.1011 \| 1.79 \| 450 \| 0.0939 \| 0.4368 \| -2.0412 \| 1.0 \| 2.4781 \| -205.2217 \| -156.2654 \| -1.0513 \| -0.8583 \|
	\| 0.0962 \| 1.89 \| 475 \| 0.0935 \| 0.4403 \| -2.0395 \| 1.0 \| 2.4798 \| -205.2041 \| -156.2308 \| -1.0510 \| -0.8574 \|
	\| 0.0971 \| 1.99 \| 500 \| 0.0937 \| 0.4389 \| -2.0384 \| 1.0 \| 2.4774 \| -205.1940 \| -156.2447 \| -1.0509 \| -0.8587 \|


	### Framework versions

	- PEFT 0.8.2
	- Transformers 4.38.1
	- Pytorch 2.2.0+cu118
	- Datasets 2.17.1
	- Tokenizers 0.15.2