mt5-large-gramatika161k-b16-e10-lr5

This model is a fine-tuned version of google/mt5-large on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
0.9659	0.63	5000	0.1455	70.1028	63.4969	69.9738	69.9761	18.3378
0.1735	1.27	10000	0.1195	71.1156	65.2149	70.9932	71.0038	18.3324
0.1391	1.9	15000	0.1076	71.5692	66.0226	71.4676	71.472	18.3281
0.1149	2.54	20000	0.1035	71.8135	66.4584	71.7212	71.7292	18.3308
0.1029	3.17	25000	0.0961	72.104	66.9459	72.0139	72.0239	18.3282
0.0898	3.81	30000	0.0944	72.231	67.1623	72.1412	72.1542	18.3314
0.0803	4.44	35000	0.0926	72.3851	67.4624	72.3051	72.3183	18.3286
0.075	5.08	40000	0.0929	72.4219	67.5102	72.3376	72.3479	18.3298
0.0665	5.71	45000	0.0917	72.5132	67.6501	72.4271	72.4383	18.3264
0.0624	6.35	50000	0.0911	72.5711	67.771	72.4938	72.5041	18.3283
0.0588	6.98	55000	0.0909	72.6295	67.8521	72.5471	72.5591	18.3276
0.0534	7.62	60000	0.0920	72.6475	67.9046	72.5743	72.5853	18.3278
0.0514	8.25	65000	0.0930	72.6373	67.894	72.5612	72.5724	18.3277
0.0492	8.88	70000	0.0930	72.6593	67.9359	72.59	72.5971	18.3273
0.047	9.52	75000	0.0932	72.6906	68.01	72.6172	72.6269	18.3264