metadata

license: mit
base_model: HuggingFaceH4/mistral-7b-sft-beta
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: zephyr-7b-dpo-full
    results: []

zephyr-7b-dpo-full

This model is a fine-tuned version of HuggingFaceH4/mistral-7b-sft-beta on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.0108
Rewards/chosen: -5.9141
Rewards/rejected: -7.7338
Rewards/accuracies: 0.7266
Rewards/margins: 1.8197
Logps/rejected: -1030.7371
Logps/chosen: -848.4521
Logits/rejected: -1.6334
Logits/chosen: -1.6493

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 128
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.2786	0.21	100	0.2781	-0.0080	-0.0710	0.6719	0.0631	-264.4583	-257.8367	-2.7623	-2.7774
0.1377	0.42	200	0.1450	-0.5817	-1.0385	0.6992	0.4567	-361.2018	-315.2145	-2.7365	-2.7512
0.1162	0.63	300	0.1186	-1.0407	-1.6725	0.7266	0.6318	-424.5983	-361.1053	-2.4888	-2.5058
0.1019	0.84	400	0.0997	-1.6327	-2.4828	0.7461	0.8501	-505.6364	-420.3094	-2.2736	-2.3013
0.0226	1.05	500	0.0406	-2.9554	-4.2565	0.7266	1.3012	-683.0034	-552.5746	-2.1929	-2.2303
0.0116	1.26	600	0.0298	-3.0110	-4.3717	0.7305	1.3607	-694.5244	-558.1376	-2.1365	-2.1643
0.0132	1.46	700	0.0320	-2.8731	-4.1217	0.7383	1.2486	-669.5266	-544.3542	-2.1173	-2.1453
0.0141	1.67	800	0.0285	-2.8506	-4.0446	0.7383	1.1939	-661.8126	-542.1040	-2.0387	-2.0557
0.008	1.88	900	0.0217	-3.7087	-4.9874	0.7148	1.2786	-756.0888	-627.9131	-1.8927	-1.9084
0.0015	2.09	1000	0.0135	-4.8936	-6.4137	0.7109	1.5202	-898.7281	-746.3977	-1.7007	-1.7103
0.0019	2.3	1100	0.0140	-4.8675	-6.4410	0.7188	1.5735	-901.4539	-743.7909	-1.7341	-1.7490
0.0014	2.51	1200	0.0128	-5.1432	-6.7584	0.7188	1.6152	-933.1906	-771.3603	-1.7194	-1.7313
0.0012	2.72	1300	0.0126	-5.2094	-6.8051	0.7227	1.5957	-937.8638	-777.9802	-1.7283	-1.7387
0.0012	2.93	1400	0.0126	-5.3124	-6.9529	0.7148	1.6405	-952.6434	-788.2790	-1.7056	-1.7185
0.0009	3.14	1500	0.0113	-5.6394	-7.3683	0.7188	1.7289	-994.1813	-820.9806	-1.6707	-1.6834
0.0007	3.35	1600	0.0115	-5.6409	-7.3656	0.7227	1.7247	-993.9130	-821.1270	-1.6691	-1.6823
0.0011	3.56	1700	0.0114	-5.6893	-7.4555	0.7227	1.7662	-1002.9027	-825.9682	-1.6580	-1.6727
0.0007	3.77	1800	0.0113	-5.7534	-7.5287	0.7227	1.7753	-1010.2194	-832.3766	-1.6467	-1.6620
0.0009	3.97	1900	0.0113	-5.7308	-7.5090	0.7227	1.7782	-1008.2513	-830.1171	-1.6581	-1.6731
0.0006	4.18	2000	0.0109	-5.8887	-7.6915	0.7266	1.8028	-1026.5013	-845.9089	-1.6381	-1.6538
0.0006	4.39	2100	0.0109	-5.9096	-7.7239	0.7266	1.8144	-1029.7469	-847.9958	-1.6345	-1.6501
0.0006	4.6	2200	0.0109	-5.8953	-7.7105	0.7266	1.8152	-1028.4065	-846.5691	-1.6360	-1.6516
0.0007	4.81	2300	0.0108	-5.9141	-7.7338	0.7266	1.8197	-1030.7371	-848.4521	-1.6334	-1.6493

Framework versions

Transformers 4.35.2
Pytorch 2.1.2+cu121
Datasets 2.14.6
Tokenizers 0.14.1