mt5-base-fce-e8-b16

This model is a fine-tuned version of google/mt5-base on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.3758
Rouge1: 84.5938
Rouge2: 76.5987
Rougel: 84.0063
Rougelsum: 84.0286
Gen Len: 15.4865

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adafactor
lr_scheduler_type: linear
num_epochs: 8

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
1.5646	0.23	400	0.5403	83.2786	74.549	82.6906	82.6978	15.5126
0.6122	0.45	800	0.4896	84.3453	75.5159	83.7564	83.7691	15.4500
0.5041	0.68	1200	0.4294	84.2563	75.8731	83.6118	83.6071	15.4760
0.4594	0.9	1600	0.4136	84.7369	76.6048	84.1541	84.1573	15.4651
0.3861	1.13	2000	0.4121	84.6947	76.574	84.0885	84.095	15.4642
0.3382	1.35	2400	0.3899	84.5537	76.4381	83.9421	83.951	15.4651
0.3442	1.58	2800	0.3866	84.6272	76.6256	84.0616	84.0804	15.4674
0.3388	1.81	3200	0.3758	84.5938	76.5987	84.0063	84.0286	15.4865
0.3109	2.03	3600	0.3822	84.5223	76.5703	83.9217	83.9438	15.4710
0.2254	2.26	4000	0.3923	84.3225	76.4146	83.7686	83.7789	15.4596
0.236	2.48	4400	0.3932	84.4412	76.4434	83.8515	83.8815	15.4692
0.2395	2.71	4800	0.3849	84.2211	76.3678	83.6444	83.6462	15.4614
0.2458	2.93	5200	0.3850	84.3534	76.598	83.8321	83.8366	15.4587
0.1832	3.16	5600	0.3973	84.4197	76.7844	83.8758	83.8781	15.4678
0.1576	3.39	6000	0.4082	84.1841	76.4425	83.6272	83.618	15.4783
0.1635	3.61	6400	0.3996	84.2051	76.3261	83.6613	83.6599	15.4788
0.1667	3.84	6800	0.3940	84.4538	76.8139	83.8887	83.8886	15.4610
0.145	4.06	7200	0.4260	84.4028	76.8101	83.8844	83.8824	15.4628
0.107	4.29	7600	0.4403	84.3559	76.8066	83.8048	83.807	15.4587
0.1078	4.51	8000	0.4337	84.3045	76.8011	83.7587	83.7699	15.4742
0.1114	4.74	8400	0.4334	84.2865	76.5415	83.7221	83.718	15.4820
0.1104	4.97	8800	0.4273	84.3211	76.8211	83.7795	83.7726	15.4838
0.0732	5.19	9200	0.4787	84.3459	76.752	83.777	83.7552	15.4829
0.069	5.42	9600	0.4839	84.4351	76.8848	83.8682	83.8584	15.4811
0.0713	5.64	10000	0.4896	84.2962	76.7428	83.7387	83.7253	15.4829
0.0716	5.87	10400	0.4788	84.3068	76.7969	83.74	83.7402	15.4747
0.0613	6.09	10800	0.5252	84.4256	77.008	83.8688	83.8828	15.4815
0.0439	6.32	11200	0.5398	84.3753	76.8235	83.793	83.7986	15.4815
0.0452	6.55	11600	0.5377	84.4467	76.8923	83.8893	83.8818	15.4815
0.0434	6.77	12000	0.5347	84.3734	76.811	83.8108	83.8063	15.4843
0.0424	7.0	12400	0.5380	84.4558	76.9239	83.9033	83.9022	15.4751
0.0296	7.22	12800	0.5808	84.332	76.8729	83.7923	83.7826	15.4774
0.0287	7.45	13200	0.5956	84.4744	77.0945	83.9222	83.9228	15.4843
0.0283	7.67	13600	0.5966	84.4271	77.0661	83.877	83.8712	15.4829
0.0285	7.9	14000	0.5983	84.4562	77.0334	83.8987	83.8985	15.4824

Framework versions

Transformers 4.28.1
Pytorch 1.11.0a0+b6df043
Datasets 2.12.0
Tokenizers 0.13.3

jeremyvictor
/

mt5-base-fce-e8-b16

mt5-base-fce-e8-b16

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results