zephyr-7b-dpo-full / README.md

wzhouad

Model save

a914407 verified about 2 months ago

preview code

raw

history blame

No virus

6.37 kB

	---
	license: mit
	base_model: HuggingFaceH4/mistral-7b-sft-beta
	tags:
	- trl
	- dpo
	- generated_from_trainer
	model-index:
	- name: zephyr-7b-dpo-full
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# zephyr-7b-dpo-full

	This model is a fine-tuned version of [HuggingFaceH4/mistral-7b-sft-beta](https://huggingface.co/HuggingFaceH4/mistral-7b-sft-beta) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0108
	- Rewards/chosen: -5.9141
	- Rewards/rejected: -7.7338
	- Rewards/accuracies: 0.7266
	- Rewards/margins: 1.8197
	- Logps/rejected: -1030.7371
	- Logps/chosen: -848.4521
	- Logits/rejected: -1.6334
	- Logits/chosen: -1.6493

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-07
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 128
	- total_eval_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 5

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 0.2786 \| 0.21 \| 100 \| 0.2781 \| -0.0080 \| -0.0710 \| 0.6719 \| 0.0631 \| -264.4583 \| -257.8367 \| -2.7623 \| -2.7774 \|
	\| 0.1377 \| 0.42 \| 200 \| 0.1450 \| -0.5817 \| -1.0385 \| 0.6992 \| 0.4567 \| -361.2018 \| -315.2145 \| -2.7365 \| -2.7512 \|
	\| 0.1162 \| 0.63 \| 300 \| 0.1186 \| -1.0407 \| -1.6725 \| 0.7266 \| 0.6318 \| -424.5983 \| -361.1053 \| -2.4888 \| -2.5058 \|
	\| 0.1019 \| 0.84 \| 400 \| 0.0997 \| -1.6327 \| -2.4828 \| 0.7461 \| 0.8501 \| -505.6364 \| -420.3094 \| -2.2736 \| -2.3013 \|
	\| 0.0226 \| 1.05 \| 500 \| 0.0406 \| -2.9554 \| -4.2565 \| 0.7266 \| 1.3012 \| -683.0034 \| -552.5746 \| -2.1929 \| -2.2303 \|
	\| 0.0116 \| 1.26 \| 600 \| 0.0298 \| -3.0110 \| -4.3717 \| 0.7305 \| 1.3607 \| -694.5244 \| -558.1376 \| -2.1365 \| -2.1643 \|
	\| 0.0132 \| 1.46 \| 700 \| 0.0320 \| -2.8731 \| -4.1217 \| 0.7383 \| 1.2486 \| -669.5266 \| -544.3542 \| -2.1173 \| -2.1453 \|
	\| 0.0141 \| 1.67 \| 800 \| 0.0285 \| -2.8506 \| -4.0446 \| 0.7383 \| 1.1939 \| -661.8126 \| -542.1040 \| -2.0387 \| -2.0557 \|
	\| 0.008 \| 1.88 \| 900 \| 0.0217 \| -3.7087 \| -4.9874 \| 0.7148 \| 1.2786 \| -756.0888 \| -627.9131 \| -1.8927 \| -1.9084 \|
	\| 0.0015 \| 2.09 \| 1000 \| 0.0135 \| -4.8936 \| -6.4137 \| 0.7109 \| 1.5202 \| -898.7281 \| -746.3977 \| -1.7007 \| -1.7103 \|
	\| 0.0019 \| 2.3 \| 1100 \| 0.0140 \| -4.8675 \| -6.4410 \| 0.7188 \| 1.5735 \| -901.4539 \| -743.7909 \| -1.7341 \| -1.7490 \|
	\| 0.0014 \| 2.51 \| 1200 \| 0.0128 \| -5.1432 \| -6.7584 \| 0.7188 \| 1.6152 \| -933.1906 \| -771.3603 \| -1.7194 \| -1.7313 \|
	\| 0.0012 \| 2.72 \| 1300 \| 0.0126 \| -5.2094 \| -6.8051 \| 0.7227 \| 1.5957 \| -937.8638 \| -777.9802 \| -1.7283 \| -1.7387 \|
	\| 0.0012 \| 2.93 \| 1400 \| 0.0126 \| -5.3124 \| -6.9529 \| 0.7148 \| 1.6405 \| -952.6434 \| -788.2790 \| -1.7056 \| -1.7185 \|
	\| 0.0009 \| 3.14 \| 1500 \| 0.0113 \| -5.6394 \| -7.3683 \| 0.7188 \| 1.7289 \| -994.1813 \| -820.9806 \| -1.6707 \| -1.6834 \|
	\| 0.0007 \| 3.35 \| 1600 \| 0.0115 \| -5.6409 \| -7.3656 \| 0.7227 \| 1.7247 \| -993.9130 \| -821.1270 \| -1.6691 \| -1.6823 \|
	\| 0.0011 \| 3.56 \| 1700 \| 0.0114 \| -5.6893 \| -7.4555 \| 0.7227 \| 1.7662 \| -1002.9027 \| -825.9682 \| -1.6580 \| -1.6727 \|
	\| 0.0007 \| 3.77 \| 1800 \| 0.0113 \| -5.7534 \| -7.5287 \| 0.7227 \| 1.7753 \| -1010.2194 \| -832.3766 \| -1.6467 \| -1.6620 \|
	\| 0.0009 \| 3.97 \| 1900 \| 0.0113 \| -5.7308 \| -7.5090 \| 0.7227 \| 1.7782 \| -1008.2513 \| -830.1171 \| -1.6581 \| -1.6731 \|
	\| 0.0006 \| 4.18 \| 2000 \| 0.0109 \| -5.8887 \| -7.6915 \| 0.7266 \| 1.8028 \| -1026.5013 \| -845.9089 \| -1.6381 \| -1.6538 \|
	\| 0.0006 \| 4.39 \| 2100 \| 0.0109 \| -5.9096 \| -7.7239 \| 0.7266 \| 1.8144 \| -1029.7469 \| -847.9958 \| -1.6345 \| -1.6501 \|
	\| 0.0006 \| 4.6 \| 2200 \| 0.0109 \| -5.8953 \| -7.7105 \| 0.7266 \| 1.8152 \| -1028.4065 \| -846.5691 \| -1.6360 \| -1.6516 \|
	\| 0.0007 \| 4.81 \| 2300 \| 0.0108 \| -5.9141 \| -7.7338 \| 0.7266 \| 1.8197 \| -1030.7371 \| -848.4521 \| -1.6334 \| -1.6493 \|


	### Framework versions

	- Transformers 4.35.2
	- Pytorch 2.1.2+cu121
	- Datasets 2.14.6
	- Tokenizers 0.14.1