metadata

license: mit
base_model: HuggingFaceH4/mistral-7b-sft-beta
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: zephyr-7b-dpo-full
    results: []

zephyr-7b-dpo-full

This model is a fine-tuned version of HuggingFaceH4/mistral-7b-sft-beta on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0256
Rewards/chosen: -2.0365
Rewards/rejected: -2.5297
Rewards/accuracies: 0.6950
Rewards/margins: 0.4933
Logps/rejected: -403.6735
Logps/chosen: -347.8913
Logits/rejected: -2.1603
Logits/chosen: -2.1828
Debug/policy Weights: 0.0416
Debug/losses: 0.0243
Debug/raw Losses: 0.5731

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 128
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen	Debug/policy Weights	Debug/losses	Debug/raw Losses
0.1731	0.0796	100	0.1627	-0.1434	-0.1775	0.5961	0.0341	-168.4450	-158.5804	-2.7045	-2.7126	0.2376	0.1613	0.6787
0.0637	0.1592	200	0.0668	-0.9118	-1.1252	0.6455	0.2134	-263.2193	-235.4248	-2.4769	-2.4894	0.1048	0.0652	0.6301
0.0398	0.2388	300	0.0421	-1.5345	-1.8565	0.6446	0.3220	-336.3452	-297.6896	-2.4777	-2.4926	0.0656	0.0401	0.6158
0.0268	0.3183	400	0.0274	-1.9929	-2.3663	0.6437	0.3735	-387.3311	-343.5292	-2.2480	-2.2673	0.0425	0.0260	0.6099
0.0286	0.3979	500	0.0340	-1.8450	-2.2365	0.6539	0.3916	-374.3529	-328.7424	-2.3185	-2.3383	0.0541	0.0326	0.6004
0.0304	0.4775	600	0.0296	-1.9424	-2.3790	0.6735	0.4366	-388.5944	-338.4821	-2.1888	-2.2094	0.0468	0.0278	0.5888
0.0289	0.5571	700	0.0279	-1.9248	-2.3277	0.6828	0.4030	-383.4731	-336.7225	-2.2155	-2.2362	0.0447	0.0266	0.5876
0.0235	0.6367	800	0.0245	-2.0777	-2.5498	0.6884	0.4720	-405.6762	-352.0160	-2.1066	-2.1293	0.0392	0.0231	0.5835
0.0333	0.7163	900	0.0342	-1.7749	-2.2999	0.6856	0.5250	-380.6898	-321.7296	-2.1171	-2.1415	0.0554	0.0321	0.5741
0.0233	0.7959	1000	0.0238	-2.2080	-2.6970	0.6950	0.4891	-420.4027	-365.0407	-2.1112	-2.1340	0.0381	0.0223	0.5775
0.0253	0.8754	1100	0.0261	-2.0131	-2.5002	0.6912	0.4871	-400.7220	-345.5524	-2.1743	-2.1963	0.0424	0.0247	0.5737
0.0244	0.9550	1200	0.0256	-2.0365	-2.5297	0.6950	0.4933	-403.6735	-347.8913	-2.1603	-2.1828	0.0416	0.0243	0.5731

Framework versions

Transformers 4.41.0.dev0
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.19.1