pythia-410m-deduped

This model is a fine-tuned version of EleutherAI/pythia-410m-deduped on the princeton-nlp/llama3-ultrafeedback dataset. It achieves the following results on the evaluation set:

Loss: 1.7801
Original Losses: 1.7969
Weight: 1.0
Abs Diff: 0.4453
Rewards/chosen: -4.875
Rewards/rejected: -5.0625
Rewards/accuracies: 0.4405
Rewards/margins: 0.2002
Logps/rejected: -2.0312
Logps/chosen: -1.9453
Logits/rejected: 5.6875
Logits/chosen: 5.7188
All Logps 1: -656.8973
All Logps 1 Values: -656.8973
All Logps 2: 434.6329
All Logps 2 Values: 434.6329

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-06
train_batch_size: 36
eval_batch_size: 36
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 8
total_train_batch_size: 2304
total_eval_batch_size: 288
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Original Losses	Weight	Abs Diff	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen	All Logps 1	All Logps 1 Values	All Logps 2	All Logps 2 Values
1.9612	0.0385	1	1.7894	1.8125	1.0	0.4492	-4.9062	-5.0938	0.4405	0.1895	-2.0312	-1.9688	5.6875	5.7188	-657.8339	-657.8338	434.6329	434.6329
1.9612	0.0769	2	1.7887	1.8125	1.0	0.4531	-4.9062	-5.0938	0.4444	0.1895	-2.0312	-1.9609	5.6875	5.6875	-657.5561	-657.5560	434.6329	434.6329
1.9612	0.1154	3	1.7887	1.8203	1.0	0.4512	-4.9375	-5.125	0.4444	0.1885	-2.0469	-1.9688	5.6875	5.7188	-657.2574	-657.2574	434.6329	434.6329
1.9612	0.1538	4	1.7891	1.8125	1.0	0.4512	-4.9375	-5.0938	0.4365	0.1807	-2.0469	-1.9688	5.6875	5.7188	-657.5514	-657.5513	434.6329	434.6329
1.868	0.1923	5	1.7881	1.8125	1.0	0.4473	-4.9062	-5.0938	0.4325	0.1816	-2.0312	-1.9609	5.6875	5.7188	-656.7651	-656.7651	434.6329	434.6329
1.868	0.2308	6	1.7911	1.8203	1.0	0.4512	-4.9375	-5.0938	0.4524	0.1670	-2.0469	-1.9766	5.6875	5.7188	-658.1024	-658.1024	434.6329	434.6329
1.868	0.2692	7	1.7870	1.8125	1.0	0.4512	-4.9062	-5.0938	0.4484	0.1846	-2.0312	-1.9609	5.6875	5.7188	-657.3370	-657.3370	434.6329	434.6329
1.868	0.3077	8	1.7835	1.8203	1.0	0.4473	-4.9062	-5.0938	0.4405	0.1729	-2.0312	-1.9688	5.6562	5.6875	-657.3589	-657.3589	434.6329	434.6329
1.868	0.3462	9	1.7860	1.8125	1.0	0.4453	-4.9062	-5.0938	0.4405	0.1855	-2.0312	-1.9609	5.6875	5.7188	-657.4703	-657.4702	434.6329	434.6329
1.886	0.3846	10	1.7897	1.8125	1.0	0.4453	-4.9062	-5.0938	0.4325	0.1855	-2.0312	-1.9609	5.6875	5.7188	-657.2245	-657.2244	434.6329	434.6329
1.886	0.4231	11	1.7852	1.8125	1.0	0.4473	-4.9062	-5.0938	0.4484	0.1807	-2.0312	-1.9609	5.6875	5.7188	-657.7448	-657.7448	434.6329	434.6329
1.886	0.4615	12	1.7827	1.8203	1.0	0.4492	-4.9062	-5.0938	0.4603	0.1797	-2.0312	-1.9609	5.6875	5.7188	-657.9037	-657.9037	434.6329	434.6329
1.886	0.5	13	1.7844	1.8203	1.0	0.4512	-4.9062	-5.0625	0.4365	0.1689	-2.0312	-1.9609	5.6875	5.7188	-657.7488	-657.7488	434.6329	434.6329
1.886	0.5385	14	1.7828	1.8047	1.0	0.4395	-4.875	-5.0625	0.4405	0.1885	-2.0312	-1.9531	5.6875	5.7188	-657.5707	-657.5707	434.6329	434.6329
1.8572	0.5769	15	1.7852	1.8125	1.0	0.4453	-4.9062	-5.0625	0.4365	0.1768	-2.0312	-1.9609	5.6875	5.7188	-657.2753	-657.2753	434.6329	434.6329
1.8572	0.6154	16	1.7798	1.8125	1.0	0.4414	-4.9062	-5.0625	0.4246	0.1709	-2.0156	-1.9531	5.6875	5.7188	-657.5228	-657.5228	434.6329	434.6329
1.8572	0.6538	17	1.7797	1.8047	1.0	0.4414	-4.875	-5.0625	0.4484	0.1816	-2.0312	-1.9531	5.6875	5.7188	-657.8073	-657.8073	434.6329	434.6329
1.8572	0.6923	18	1.7830	1.8125	1.0	0.4375	-4.9062	-5.0625	0.4405	0.1631	-2.0312	-1.9609	5.6875	5.7188	-657.4370	-657.4370	434.6329	434.6329
1.8572	0.7308	19	1.7831	1.8047	1.0	0.4414	-4.875	-5.0625	0.4524	0.1787	-2.0312	-1.9609	5.6875	5.7188	-657.5411	-657.5412	434.6329	434.6329
1.8374	0.7692	20	1.7812	1.8047	1.0	0.4512	-4.9062	-5.0938	0.4524	0.1973	-2.0312	-1.9531	5.6875	5.7188	-657.5830	-657.5831	434.6329	434.6329
1.8374	0.8077	21	1.7850	1.8125	1.0	0.4414	-4.875	-5.0625	0.4444	0.1719	-2.0312	-1.9609	5.6875	5.7188	-657.6910	-657.6910	434.6329	434.6329
1.8374	0.8462	22	1.7851	1.8047	1.0	0.4434	-4.9062	-5.0625	0.4405	0.1836	-2.0312	-1.9531	5.6875	5.7188	-657.1679	-657.1679	434.6329	434.6329
1.8374	0.8846	23	1.7782	1.8047	1.0	0.4375	-4.9062	-5.0625	0.4365	0.1748	-2.0312	-1.9609	5.6875	5.7188	-658.0194	-658.0193	434.6329	434.6329
1.8374	0.9231	24	1.7800	1.8047	1.0	0.4375	-4.9062	-5.0625	0.4524	0.1709	-2.0312	-1.9609	5.6875	5.7188	-657.4482	-657.4482	434.6329	434.6329
1.8714	0.9615	25	1.7788	1.7969	1.0	0.4375	-4.875	-5.0625	0.4325	0.1816	-2.0312	-1.9531	5.6875	5.7188	-657.4512	-657.4511	434.6329	434.6329
1.8714	1.0	26	1.7801	1.7969	1.0	0.4453	-4.875	-5.0625	0.4405	0.2002	-2.0312	-1.9453	5.6875	5.7188	-656.8973	-656.8973	434.6329	434.6329

Framework versions

Transformers 4.42.3
Pytorch 2.2.2+cu121
Datasets 2.20.0
Tokenizers 0.19.1

RAY2L
/

pythia-410m-deduped-SimPOW-0

pythia-410m-deduped

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for RAY2L/pythia-410m-deduped-SimPOW-0

Dataset used to train RAY2L/pythia-410m-deduped-SimPOW-0

Evaluation results