wxzhang
/

dpo-selective-longerrun

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions Community

dpo-selective-longerrun

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.4916
Rewards/chosen: -0.6959
Rewards/rejected: -2.0431
Rewards/accuracies: 0.7579
Rewards/margins: 1.3472
Logps/rejected: -312.5994
Logps/chosen: -310.2374
Logits/rejected: -2.3498
Logits/chosen: -2.3901

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 4
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
training_steps: 1500

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6163	0.1	100	0.6145	0.0147	-0.2611	0.7024	0.2758	-276.9589	-296.0254	-2.3069	-2.3542
0.5608	0.21	200	0.5507	-0.0898	-0.8075	0.7401	0.7176	-287.8870	-298.1169	-2.3806	-2.4286
0.4934	0.31	300	0.5225	-0.1646	-1.0392	0.7579	0.8746	-292.5221	-299.6117	-2.3416	-2.3850
0.4812	0.42	400	0.5148	-0.2130	-1.1798	0.7599	0.9668	-295.3333	-300.5795	-2.3285	-2.3697
0.5217	0.52	500	0.5094	-0.1747	-1.1571	0.7599	0.9824	-294.8788	-299.8136	-2.3074	-2.3432
0.5069	0.63	600	0.5037	-0.0404	-1.0494	0.7659	1.0090	-292.7251	-297.1272	-2.2444	-2.2854
0.4582	0.73	700	0.5003	-0.6338	-1.7232	0.7599	1.0894	-306.2008	-308.9958	-2.2469	-2.2897
0.457	0.84	800	0.4907	-0.4901	-1.6054	0.7639	1.1153	-303.8464	-306.1228	-2.2928	-2.3342
0.4723	0.94	900	0.4933	-0.4418	-1.5567	0.7659	1.1149	-302.8719	-305.1562	-2.3355	-2.3762
0.3094	1.05	1000	0.4922	-0.8030	-2.0474	0.7639	1.2444	-312.6856	-312.3804	-2.3698	-2.4094
0.2725	1.15	1100	0.4921	-0.5635	-1.8640	0.7460	1.3005	-309.0183	-307.5903	-2.3382	-2.3785
0.2932	1.26	1200	0.4924	-0.6522	-2.0030	0.7579	1.3509	-311.7977	-309.3632	-2.3511	-2.3915
0.275	1.36	1300	0.4916	-0.6366	-1.9750	0.7599	1.3383	-311.2369	-309.0526	-2.3531	-2.3934
0.2768	1.47	1400	0.4922	-0.7011	-2.0464	0.7579	1.3453	-312.6646	-310.3419	-2.3505	-2.3908
0.2863	1.57	1500	0.4916	-0.6959	-2.0431	0.7579	1.3472	-312.5994	-310.2374	-2.3498	-2.3901

Framework versions

Transformers 4.36.2
Pytorch 2.1.2
Datasets 2.14.6
Tokenizers 0.15.0

Downloads last month: 4

Safetensors

Model size

7.24B params

Tensor type

BF16

·

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

Metadata error: specify a dataset to view leaderboard