roberta-wiki-en

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss
1.5933	0.0928	12500	1.4776
1.6391	0.1856	25000	1.5202
1.6551	0.2783	37500	1.5291
1.6398	0.3711	50000	1.5364
1.6429	0.4639	62500	1.5345
1.6354	0.5567	75000	1.5338
1.629	0.6495	87500	1.5325
1.6457	0.7423	100000	1.5285
1.6514	0.8350	112500	1.5377
1.5955	0.9278	125000	1.5234
1.616	1.0206	137500	1.5196
1.5456	2.2268	150000	1.4437
1.5265	2.4124	162500	1.4288
1.514	2.5979	175000	1.4139
1.5114	2.7835	187500	1.4059
1.4989	2.9691	200000	1.4008
1.4962	3.1546	212500	1.3926
1.481	3.3402	225000	1.3850
1.469	3.5258	237500	1.3777
1.4654	3.7113	250000	1.3689
1.463	3.8969	262500	1.3652
1.4546	4.0825	275000	1.3575
1.4436	4.2680	287500	1.3489
1.4312	4.4536	300000	1.3441
1.4312	4.6392	312500	1.3359
1.4204	4.8247	325000	1.3272
1.4138	5.0103	337500	1.3228
1.4096	5.1959	350000	1.3168
1.4162	5.3814	362500	1.3108
1.4005	5.5670	375000	1.3048
1.3965	5.7526	387500	1.2997
1.3802	5.9381	400000	1.2966