smolm-autoreg-bpe-counterfactual-babylm-pipps_and_keys_to_it_all_10k-1e-3

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy
3.6157	1.0	18844	3.7143	0.3599
3.3979	2.0	37688	3.5062	0.3804
3.2663	3.0	56532	3.3950	0.3932
3.1887	4.0	75376	3.3694	0.3977
3.1328	5.0	94220	3.3361	0.4009
3.0925	6.0	113064	3.3236	0.4038
3.0537	7.0	131908	3.3165	0.4050
3.0289	8.0	150752	3.3142	0.4063
2.9979	9.0	169596	3.2959	0.4083
2.9734	10.0	188440	3.2976	0.4096
2.9501	11.0	207284	3.3026	0.4094
2.9302	12.0	226128	3.3036	0.4097
2.9067	13.0	244972	3.3101	0.4103
2.885	14.0	263816	3.3063	0.4106
2.8686	15.0	282660	3.3195	0.4098
2.8474	16.0	301504	3.3275	0.4106
2.8235	17.0	320348	3.3297	0.4108
2.8091	18.0	339192	3.3385	0.4105
2.7899	19.0	358036	3.3433	0.4102
2.7718	20.0	376880	3.3502	0.4102