Centrum-multinews

This model is a fine-tuned version of Centrum on the multi_news dataset. The details of the model are mentioned in the preprint Multi-Document Summarization with Centroid-Based Pretraining (Ratish Puduppully and Mark Steedman). It achieves the following results on the evaluation set:

Loss: 3.2740
Rouge1: 46.2987
Rouge2: 18.4863
Rougel: 24.2428
Rougelsum: 42.5102
Gen Len: 308.6606

Model description

The script for training and inference of Centrum-multinews is available on https://github.com/ratishsp/centrum

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 1
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 4
total_train_batch_size: 16
total_eval_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 2500
training_steps: 25000
mixed_precision_training: Native AMP
label_smoothing_factor: 0.1

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
3.2702	1.78	5000	3.2853	44.0203	16.6061	23.3846	40.3853	277.1855
3.2762	1.96	5500	3.2853	44.725	16.9262	23.475	41.0003	288.4173
3.2114	2.14	6000	3.2857	44.6456	17.0245	23.7328	40.9131	257.2761
3.1981	2.31	6500	3.2817	44.7869	17.0849	23.8372	41.0669	254.8618
3.2298	2.49	7000	3.2802	45.2657	17.2618	23.8204	41.5807	263.0854
3.2167	2.67	7500	3.2773	44.9516	17.0538	23.7894	41.1673	244.6939
3.2069	2.85	8000	3.2712	45.2153	17.2766	23.9883	41.4558	245.4036
3.1822	3.02	8500	3.2786	45.4747	17.6754	24.1878	41.7304	254.6624
3.1529	3.2	9000	3.2740	44.9033	17.1386	23.8511	41.177	246.0157
3.1407	3.38	9500	3.2704	45.1045	17.2335	23.9124	41.3243	243.4922
3.1376	3.56	10000	3.2721	45.2694	17.4797	24.1072	41.5441	243.8396
3.1545	3.74	10500	3.2720	45.3105	17.6338	24.1547	41.5731	231.1805
3.1307	3.91	11000	3.2684	45.4309	17.2665	23.8954	41.6518	250.1039
3.1022	4.09	11500	3.2719	45.1959	17.4017	24.056	41.5363	242.5923
3.1139	4.27	12000	3.2711	45.3864	17.4653	24.028	41.6797	240.5701
3.0978	4.45	12500	3.2722	45.5694	17.501	24.1452	41.7894	232.1149
3.1082	4.63	13000	3.2687	45.504	17.5137	24.1067	41.7686	245.1845
3.1059	4.8	13500	3.2686	45.3603	17.1619	23.8655	41.5953	248.6327
3.1141	4.98	14000	3.2658	45.2741	17.3814	24.0377	41.5263	234.0194
3.0294	5.16	14500	3.2716	45.7203	17.5962	24.1367	41.9119	244.4207
3.0613	5.34	15000	3.2697	45.775	17.6959	24.1867	42.0018	242.0381
3.0549	5.52	15500	3.2703	45.8193	17.686	24.1997	42.0109	242.5493
3.0725	5.69	16000	3.2655	45.3515	17.3438	24.0586	41.6126	240.2812
3.0728	5.87	16500	3.2671	45.6791	17.5028	24.0691	41.9219	250.455
3.0142	6.05	17000	3.2708	46.0287	17.8079	24.2916	42.2369	245.6204
3.0312	6.23	17500	3.2701	45.5731	17.5404	24.0925	41.7584	236.2234
3.0231	6.41	18000	3.2719	46.1094	17.7117	24.1117	42.2882	260.1686
3.0414	6.58	18500	3.2703	45.9178	17.6987	24.1882	42.1382	245.0961
3.0434	6.76	19000	3.2715	46.0129	17.7545	24.2235	42.245	247.8225
3.0456	6.94	19500	3.2682	45.8634	17.6462	24.1366	42.1194	256.9835
3.0188	7.12	20000	3.2752	45.8366	17.6771	24.165	42.0438	240.1866
3.0227	7.3	20500	3.2722	46.0509	17.8248	24.2389	42.2681	245.8337
2.9895	7.47	21000	3.2726	45.7896	17.5833	24.1226	42.016	243.867
3.0146	7.65	21500	3.2693	46.0179	17.6952	24.2204	42.2436	244.0598
3.014	7.83	22000	3.2708	46.0704	17.75	24.2308	42.2591	240.4804
3.0427	8.01	22500	3.2734	46.0662	17.7231	24.1915	42.2227	242.4203
2.9835	8.19	23000	3.2740	46.165	17.8947	24.366	42.3521	236.6266
2.987	8.36	23500	3.2719	45.9025	17.7625	24.2432	42.1257	238.479
2.9922	8.54	24000	3.2731	46.1971	17.7962	24.2279	42.3853	245.2081
2.9788	8.72	24500	3.2718	46.0806	17.8417	24.3261	42.264	240.1747
2.9878	8.9	25000	3.2715	46.0618	17.7725	24.2234	42.2574	242.5598

Framework versions

Transformers 4.20.0.dev0
Pytorch 1.11.0
Datasets 2.2.2
Tokenizers 0.12.1

ratishsp
/

Centrum-multinews