zephyr-7b-dpo-qlora / README.md

ale-bay

End of training

452055b verified 2 months ago

preview code

raw

history blame contribute delete

No virus

5.73 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- alignment-handbook
	- trl
	- dpo
	- generated_from_trainer
	base_model: mistralai/Mistral-7B-v0.1
	datasets:
	- HuggingFaceH4/ultrafeedback_binarized
	model-index:
	- name: zephyr-7b-dpo-qlora
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# zephyr-7b-dpo-qlora

	This model is a fine-tuned version of [ale-bay/zephyr-7b-sft-qlora](https://huggingface.co/ale-bay/zephyr-7b-sft-qlora) on the HuggingFaceH4/ultrafeedback_binarized dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4975
	- Rewards/chosen: -2.4549
	- Rewards/rejected: -3.4757
	- Rewards/accuracies: 0.7490
	- Rewards/margins: 1.0207
	- Logps/rejected: -595.2866
	- Logps/chosen: -517.1966
	- Logits/rejected: -1.3432
	- Logits/chosen: -1.4358

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-06
	- train_batch_size: 4
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 2
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 32
	- total_eval_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.6641 \| 0.05 \| 100 \| 0.6636 \| 0.0054 \| -0.0681 \| 0.6900 \| 0.0735 \| -254.5337 \| -271.1659 \| -2.0436 \| -2.1368 \|
	\| 0.6105 \| 0.1 \| 200 \| 0.6075 \| -0.3236 \| -0.5938 \| 0.6890 \| 0.2702 \| -307.0967 \| -304.0613 \| -2.0030 \| -2.0919 \|
	\| 0.5883 \| 0.16 \| 300 \| 0.5817 \| -0.7122 \| -1.1286 \| 0.7020 \| 0.4164 \| -360.5768 \| -342.9188 \| -1.9914 \| -2.0761 \|
	\| 0.5651 \| 0.21 \| 400 \| 0.5665 \| -0.7901 \| -1.2897 \| 0.7250 \| 0.4996 \| -376.6874 \| -350.7093 \| -1.9001 \| -1.9820 \|
	\| 0.5136 \| 0.26 \| 500 \| 0.5520 \| -1.0330 \| -1.6646 \| 0.7190 \| 0.6316 \| -414.1808 \| -374.9992 \| -1.8081 \| -1.8880 \|
	\| 0.5587 \| 0.31 \| 600 \| 0.5327 \| -1.3215 \| -2.0089 \| 0.7320 \| 0.6874 \| -448.6079 \| -403.8534 \| -1.4665 \| -1.5609 \|
	\| 0.5167 \| 0.37 \| 700 \| 0.5299 \| -1.2797 \| -2.1992 \| 0.7230 \| 0.9196 \| -467.6413 \| -399.6684 \| -1.3918 \| -1.4903 \|
	\| 0.5465 \| 0.42 \| 800 \| 0.5189 \| -1.6646 \| -2.4686 \| 0.7200 \| 0.8041 \| -494.5844 \| -438.1617 \| -1.3685 \| -1.4642 \|
	\| 0.5002 \| 0.47 \| 900 \| 0.5142 \| -1.7844 \| -2.7217 \| 0.7290 \| 0.9373 \| -519.8885 \| -450.1383 \| -1.4179 \| -1.5054 \|
	\| 0.5017 \| 0.52 \| 1000 \| 0.5058 \| -2.6175 \| -3.6120 \| 0.7360 \| 0.9946 \| -608.9218 \| -533.4493 \| -1.2973 \| -1.3948 \|
	\| 0.4966 \| 0.58 \| 1100 \| 0.5043 \| -2.0581 \| -2.9819 \| 0.7370 \| 0.9239 \| -545.9103 \| -477.5080 \| -1.3783 \| -1.4740 \|
	\| 0.5087 \| 0.63 \| 1200 \| 0.5040 \| -2.3715 \| -3.3475 \| 0.7450 \| 0.9760 \| -582.4712 \| -508.8495 \| -1.3331 \| -1.4262 \|
	\| 0.4799 \| 0.68 \| 1300 \| 0.5011 \| -2.3067 \| -3.3444 \| 0.7450 \| 1.0377 \| -582.1562 \| -502.3687 \| -1.3340 \| -1.4277 \|
	\| 0.4606 \| 0.73 \| 1400 \| 0.4991 \| -2.5016 \| -3.5583 \| 0.7430 \| 1.0567 \| -603.5469 \| -521.8631 \| -1.3291 \| -1.4219 \|
	\| 0.4763 \| 0.79 \| 1500 \| 0.4985 \| -2.4979 \| -3.5204 \| 0.7470 \| 1.0225 \| -599.7631 \| -521.4944 \| -1.3394 \| -1.4325 \|
	\| 0.5008 \| 0.84 \| 1600 \| 0.4977 \| -2.4555 \| -3.4719 \| 0.7480 \| 1.0164 \| -594.9102 \| -517.2504 \| -1.3492 \| -1.4415 \|
	\| 0.4654 \| 0.89 \| 1700 \| 0.4976 \| -2.4498 \| -3.4672 \| 0.7510 \| 1.0174 \| -594.4417 \| -516.6852 \| -1.3478 \| -1.4402 \|
	\| 0.4854 \| 0.94 \| 1800 \| 0.4975 \| -2.4526 \| -3.4731 \| 0.7480 \| 1.0205 \| -595.0339 \| -516.9640 \| -1.3441 \| -1.4366 \|
	\| 0.4879 \| 0.99 \| 1900 \| 0.4974 \| -2.4531 \| -3.4740 \| 0.75 \| 1.0209 \| -595.1221 \| -517.0148 \| -1.3432 \| -1.4359 \|


	### Framework versions

	- PEFT 0.7.1
	- Transformers 4.39.3
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.1
	- Tokenizers 0.15.2