mt5-large-gramatika161k-b16-5000

This model is a fine-tuned version of google/mt5-large on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
0.684	0.63	5000	0.1422	70.2446	63.7161	70.115	70.1185	18.3370
0.1704	1.27	10000	0.1185	71.1601	65.3066	71.0354	71.041	18.3348
0.1383	1.9	15000	0.1079	71.5399	65.9422	71.4296	71.4371	18.3289
0.1166	2.54	20000	0.1032	71.8281	66.4753	71.7248	71.7321	18.3303
0.106	3.17	25000	0.0983	72.0264	66.8201	71.9367	71.9427	18.3291
0.0952	3.81	30000	0.0962	72.1134	66.9793	72.0288	72.0362	18.3297
0.0891	4.44	35000	0.0949	72.227	67.1468	72.1408	72.1494	18.3283