layer-project

This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T on the None dataset. It achieves the following results on the evaluation set:

Model description

The model is fine-tuned as a reward function for RLHF finetuning.

The model is trained on very limited data.

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.0365	1.0	5	0.0393	1.0
0.0142	2.0	10	0.0349	1.0
0.0228	3.0	15	0.0295	1.0
0.0157	4.0	20	0.0249	1.0
0.0153	5.0	25	0.0211	1.0
0.0117	6.0	30	0.0181	1.0
0.0072	7.0	35	0.0155	1.0
0.0121	8.0	40	0.0135	1.0
0.0097	9.0	45	0.0119	1.0
0.008	10.0	50	0.0106	1.0
0.0055	11.0	55	0.0095	1.0
0.0046	12.0	60	0.0087	1.0
0.0085	13.0	65	0.0081	1.0
0.0046	14.0	70	0.0076	1.0
0.0059	15.0	75	0.0072	1.0
0.0044	16.0	80	0.0069	1.0
0.0021	17.0	85	0.0067	1.0
0.0039	18.0	90	0.0066	1.0
0.0027	19.0	95	0.0065	1.0
0.0039	20.0	100	0.0064	1.0