metadata

license: apache-2.0
base_model: alignment-handbook/zephyr-7b-sft-full
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
model-index:
  - name: zephyr-7b-dpo-full
    results: []

zephyr-7b-dpo-full

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 0.9109
Rewards/chosen: -6.4067
Rewards/rejected: -10.6017
Rewards/accuracies: 0.7659
Rewards/margins: 4.1951
Logps/rejected: -366.2361
Logps/chosen: -346.0206
Logits/rejected: -1.3898
Logits/chosen: -1.6525

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6044	0.1047	100	0.6129	0.3596	0.0870	0.7302	0.2726	-259.3489	-278.3580	-2.5834	-2.6369
0.57	0.2093	200	0.5571	0.5922	-0.1676	0.7540	0.7598	-261.8945	-276.0320	-2.4867	-2.5465
0.5429	0.3140	300	0.5366	0.0019	-0.9625	0.7540	0.9644	-269.8440	-281.9351	-2.3542	-2.4208
0.5168	0.4186	400	0.5452	0.1591	-0.8845	0.7599	1.0436	-269.0635	-280.3629	-2.4760	-2.5389
0.5337	0.5233	500	0.5324	0.1371	-1.0631	0.7778	1.2002	-270.8497	-280.5833	-2.4225	-2.4845
0.5163	0.6279	600	0.5369	-0.3785	-1.5394	0.7560	1.1609	-275.6129	-285.7394	-2.4333	-2.4912
0.4881	0.7326	700	0.5380	0.1243	-1.2129	0.7679	1.3371	-272.3477	-280.7114	-2.3892	-2.4505
0.49	0.8373	800	0.5411	0.1149	-1.0375	0.7639	1.1524	-270.5944	-280.8054	-2.4479	-2.5044
0.5097	0.9419	900	0.5622	-0.2002	-1.4670	0.7698	1.2668	-274.8889	-283.9564	-2.5298	-2.5820
0.1144	1.0466	1000	0.5714	-0.2947	-1.8774	0.7639	1.5826	-278.9927	-284.9014	-2.5495	-2.6080
0.087	1.1512	1100	0.5960	-0.6932	-2.6301	0.7837	1.9369	-286.5200	-288.8864	-2.5036	-2.5699
0.1122	1.2559	1200	0.6133	-1.5655	-3.6620	0.7540	2.0965	-296.8384	-297.6089	-2.4063	-2.4765
0.1303	1.3605	1300	0.6040	-1.7575	-3.6828	0.7837	1.9252	-297.0464	-299.5291	-2.3747	-2.4470
0.0884	1.4652	1400	0.6035	-1.4203	-3.2606	0.7798	1.8403	-292.8251	-296.1571	-2.3840	-2.4553
0.0807	1.5699	1500	0.6033	-1.8277	-3.9141	0.7877	2.0864	-299.3599	-300.2314	-2.3962	-2.4731
0.1027	1.6745	1600	0.6157	-1.3414	-3.3683	0.7857	2.0269	-293.9024	-295.3680	-2.3746	-2.4536
0.0989	1.7792	1700	0.6009	-1.4146	-3.5889	0.7917	2.1744	-296.1083	-296.0996	-2.3750	-2.4548
0.0945	1.8838	1800	0.6109	-1.1285	-3.3269	0.7877	2.1984	-293.4879	-293.2390	-2.4051	-2.4825
0.0789	1.9885	1900	0.6093	-1.9115	-4.0587	0.7837	2.1472	-300.8062	-301.0694	-2.3968	-2.4730
0.0086	2.0931	2000	0.7414	-2.9121	-5.9384	0.7758	3.0263	-319.6029	-311.0746	-2.2016	-2.2928
0.0137	2.1978	2100	0.8116	-4.6780	-8.1860	0.7679	3.5080	-342.0789	-328.7336	-1.8924	-2.0338
0.0152	2.3025	2200	0.8371	-5.0993	-8.7589	0.7679	3.6596	-347.8080	-332.9471	-1.8207	-1.9887
0.0062	2.4071	2300	0.8704	-6.2532	-10.1416	0.7679	3.8884	-361.6346	-344.4856	-1.5897	-1.8086
0.0124	2.5118	2400	0.8848	-5.6604	-9.6724	0.7698	4.0120	-356.9429	-338.5582	-1.5561	-1.7751
0.0078	2.6164	2500	0.8926	-6.1681	-10.2415	0.7679	4.0734	-362.6336	-343.6352	-1.4181	-1.6590
0.0083	2.7211	2600	0.9002	-6.5323	-10.6541	0.7659	4.1218	-366.7602	-347.2773	-1.3929	-1.6493
0.0115	2.8257	2700	0.9076	-6.4271	-10.6033	0.7639	4.1762	-366.2516	-346.2245	-1.4047	-1.6632
0.0134	2.9304	2800	0.9106	-6.3982	-10.5970	0.7639	4.1988	-366.1889	-345.9361	-1.3900	-1.6525

Framework versions

Transformers 4.40.2
Pytorch 2.1.2
Datasets 2.19.1
Tokenizers 0.19.1