metadata

tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: pythia-1.4b-dpo-full
    results: []

pythia-1.4b-dpo-full

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.6403
Rewards/chosen: 0.6094
Rewards/rejected: 0.4102
Rewards/accuracies: 0.5893
Rewards/margins: 0.2002
Logps/rejected: -2024.0
Logps/chosen: -2320.0
Logits/rejected: -0.6719
Logits/chosen: -0.6172

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-07
train_batch_size: 5
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 6
total_train_batch_size: 30
total_eval_batch_size: 48
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.684	0.05	100	0.6768	0.2314	0.1904	0.4494	0.0405	-2048.0	-2352.0	-0.7227	-0.6641
0.663	0.1	200	0.6566	0.5977	0.4883	0.4940	0.1108	-2016.0	-2320.0	-0.7266	-0.6680
0.6529	0.15	300	0.6513	0.625	0.4941	0.5149	0.1279	-2016.0	-2320.0	-0.7188	-0.6562
0.6371	0.2	400	0.6491	0.6562	0.5	0.5595	0.1523	-2016.0	-2304.0	-0.7266	-0.6680
0.6206	0.25	500	0.6466	0.5391	0.3945	0.5952	0.1445	-2024.0	-2320.0	-0.7148	-0.6562
0.686	0.29	600	0.6446	0.5781	0.4180	0.5714	0.1592	-2024.0	-2320.0	-0.7188	-0.6602
0.6459	0.34	700	0.6449	0.5508	0.3633	0.6012	0.1885	-2032.0	-2320.0	-0.6875	-0.6289
0.6458	0.39	800	0.6421	0.5586	0.3867	0.5774	0.1709	-2024.0	-2320.0	-0.6953	-0.6406
0.6451	0.44	900	0.6398	0.7109	0.5039	0.5685	0.2070	-2016.0	-2304.0	-0.6719	-0.6133
0.6213	0.49	1000	0.6407	0.7734	0.5742	0.5714	0.2012	-2008.0	-2304.0	-0.6602	-0.6016
0.6313	0.54	1100	0.6387	0.5391	0.3555	0.5893	0.1807	-2032.0	-2320.0	-0.6680	-0.6094
0.6298	0.59	1200	0.6380	0.6953	0.4922	0.6042	0.2031	-2016.0	-2304.0	-0.6523	-0.5977
0.6461	0.64	1300	0.6396	0.5586	0.3613	0.5863	0.1963	-2032.0	-2320.0	-0.6914	-0.6367
0.6258	0.69	1400	0.6360	0.6914	0.4727	0.5923	0.2207	-2016.0	-2304.0	-0.6758	-0.6172
0.6347	0.74	1500	0.6375	0.625	0.4141	0.5893	0.2100	-2024.0	-2320.0	-0.6641	-0.6094
0.6185	0.79	1600	0.6382	0.5977	0.3926	0.6042	0.2051	-2032.0	-2320.0	-0.6797	-0.625
0.6408	0.83	1700	0.6374	0.5977	0.3926	0.5952	0.2041	-2024.0	-2320.0	-0.6719	-0.6172
0.662	0.88	1800	0.6355	0.6094	0.3984	0.6012	0.2119	-2024.0	-2320.0	-0.6836	-0.6289
0.6385	0.93	1900	0.6379	0.6055	0.3926	0.625	0.2129	-2024.0	-2320.0	-0.6758	-0.6211
0.6154	0.98	2000	0.6381	0.6094	0.4043	0.6012	0.2041	-2024.0	-2320.0	-0.6758	-0.6211

Framework versions

Transformers 4.38.2
Pytorch 2.2.1
Datasets 2.14.6
Tokenizers 0.15.2