t5-small_adafactor

This model is a fine-tuned version of oMateos2020/t5-small_adafactor on the xsum dataset. It achieves the following results on the evaluation set:

Loss: 2.1167
Rouge1: 32.8631
Rouge2: 11.658
Rougel: 26.6192
Rougelsum: 26.6224
Gen Len: 18.7663

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 24
eval_batch_size: 24
seed: 42
optimizer: Adafactor
lr_scheduler_type: linear
num_epochs: 1
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
2.1315	0.02	200	2.1865	31.9486	10.9605	25.7418	25.7408	18.8466
2.1297	0.05	400	2.1965	31.9598	10.9463	25.784	25.7867	18.8525
2.1284	0.07	600	2.1981	32.231	11.1003	26.0155	26.0226	18.8466
2.1315	0.09	800	2.1873	31.9161	10.8642	25.7166	25.7273	18.8227
2.1212	0.12	1000	2.1892	32.4646	11.1852	26.2451	26.2439	18.8259
2.1028	0.14	1200	2.1978	32.2886	11.1346	26.0795	26.0827	18.7685
2.1221	0.16	1400	2.1936	32.2901	11.0821	25.9983	26.0024	18.7798
2.1168	0.19	1600	2.1922	32.1655	11.1451	25.986	25.9893	18.8232
2.1166	0.21	1800	2.1836	32.2611	11.174	26.0594	26.0688	18.7633
2.1053	0.24	2000	2.1929	32.3321	11.213	26.1859	26.1903	18.7758
2.1126	0.26	2200	2.1811	32.2078	11.1792	26.0776	26.0817	18.8197
2.1038	0.28	2400	2.1836	32.2799	11.2511	26.1191	26.1251	18.7884
2.1181	0.31	2600	2.1805	32.1197	11.1586	26.0441	26.0441	18.8045
2.1217	0.33	2800	2.1806	32.3051	11.2638	26.1319	26.1386	18.7886
2.116	0.35	3000	2.1741	32.2799	11.1887	26.1224	26.1363	18.7769
2.1118	0.38	3200	2.1767	32.387	11.2053	26.077	26.0845	18.8407
2.1164	0.4	3400	2.1743	32.5008	11.4021	26.3291	26.3297	18.7731
2.1068	0.42	3600	2.1673	32.2347	11.1676	26.0657	26.0662	18.817
2.1276	0.45	3800	2.1664	32.2434	11.2862	26.094	26.0994	18.7713
2.1313	0.47	4000	2.1636	32.694	11.3724	26.4071	26.4008	18.7709
2.1229	0.49	4200	2.1633	32.456	11.4057	26.2733	26.2689	18.7586
2.129	0.52	4400	2.1641	32.309	11.2133	26.1062	26.1121	18.7729
2.1425	0.54	4600	2.1577	32.5879	11.4001	26.3045	26.3078	18.8104
2.1536	0.56	4800	2.1507	32.5152	11.4035	26.3054	26.3116	18.7941
2.148	0.59	5000	2.1503	32.8088	11.5641	26.5346	26.5311	18.7602
2.1541	0.61	5200	2.1491	32.8185	11.5816	26.5261	26.527	18.7654
2.155	0.64	5400	2.1466	32.7229	11.5339	26.4363	26.442	18.8404
2.1579	0.66	5600	2.1435	32.884	11.6042	26.5862	26.5891	18.7713
2.1601	0.68	5800	2.1393	32.8027	11.5328	26.4521	26.4567	18.7904
2.1765	0.71	6000	2.1393	32.8059	11.5751	26.5499	26.5551	18.7768
2.2176	0.73	6200	2.1345	33.0734	11.8056	26.7546	26.7607	18.7756
2.2126	0.75	6400	2.1328	32.7478	11.5925	26.5333	26.5359	18.7819
2.1916	0.78	6600	2.1298	32.658	11.491	26.379	26.3869	18.8101
2.2162	0.8	6800	2.1297	32.7843	11.5629	26.4736	26.4728	18.8187
2.2358	0.82	7000	2.1287	32.9181	11.6378	26.5966	26.5987	18.8039
2.2371	0.85	7200	2.1265	32.8413	11.674	26.5905	26.5831	18.7962
2.256	0.87	7400	2.1245	32.7412	11.5627	26.4976	26.503	18.7728
2.2566	0.89	7600	2.1220	32.8165	11.6069	26.5301	26.5295	18.7871
2.2954	0.92	7800	2.1197	32.7399	11.5417	26.4914	26.4938	18.7752
2.2766	0.94	8000	2.1187	32.853	11.6411	26.5909	26.5938	18.7852
2.3273	0.96	8200	2.1169	32.9376	11.709	26.6665	26.6672	18.7734
2.3182	0.99	8400	2.1167	32.8631	11.658	26.6192	26.6224	18.7663

Framework versions

Transformers 4.20.1
Pytorch 1.12.0+cu113
Datasets 2.3.2
Tokenizers 0.12.1

oMateos2020
/

t5-small_adafactor

t5-small_adafactor

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train oMateos2020/t5-small_adafactor

Evaluation results