Centrum-multinews

This model is a fine-tuned version of Centrum on the multi_news dataset. The details of the model are mentioned in the preprint Multi-Document Summarization with Centroid-Based Pretraining (Ratish Puduppully and Mark Steedman). It achieves the following results on the evaluation set:

  • Loss: 3.2740
  • Rouge1: 46.2987
  • Rouge2: 18.4863
  • Rougel: 24.2428
  • Rougelsum: 42.5102
  • Gen Len: 308.6606

Model description

The script for training and inference of Centrum-multinews is available on https://github.com/ratishsp/centrum

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 1
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 2500
  • training_steps: 25000
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.1

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
3.2702 1.78 5000 3.2853 44.0203 16.6061 23.3846 40.3853 277.1855
3.2762 1.96 5500 3.2853 44.725 16.9262 23.475 41.0003 288.4173
3.2114 2.14 6000 3.2857 44.6456 17.0245 23.7328 40.9131 257.2761
3.1981 2.31 6500 3.2817 44.7869 17.0849 23.8372 41.0669 254.8618
3.2298 2.49 7000 3.2802 45.2657 17.2618 23.8204 41.5807 263.0854
3.2167 2.67 7500 3.2773 44.9516 17.0538 23.7894 41.1673 244.6939
3.2069 2.85 8000 3.2712 45.2153 17.2766 23.9883 41.4558 245.4036
3.1822 3.02 8500 3.2786 45.4747 17.6754 24.1878 41.7304 254.6624
3.1529 3.2 9000 3.2740 44.9033 17.1386 23.8511 41.177 246.0157
3.1407 3.38 9500 3.2704 45.1045 17.2335 23.9124 41.3243 243.4922
3.1376 3.56 10000 3.2721 45.2694 17.4797 24.1072 41.5441 243.8396
3.1545 3.74 10500 3.2720 45.3105 17.6338 24.1547 41.5731 231.1805
3.1307 3.91 11000 3.2684 45.4309 17.2665 23.8954 41.6518 250.1039
3.1022 4.09 11500 3.2719 45.1959 17.4017 24.056 41.5363 242.5923
3.1139 4.27 12000 3.2711 45.3864 17.4653 24.028 41.6797 240.5701
3.0978 4.45 12500 3.2722 45.5694 17.501 24.1452 41.7894 232.1149
3.1082 4.63 13000 3.2687 45.504 17.5137 24.1067 41.7686 245.1845
3.1059 4.8 13500 3.2686 45.3603 17.1619 23.8655 41.5953 248.6327
3.1141 4.98 14000 3.2658 45.2741 17.3814 24.0377 41.5263 234.0194
3.0294 5.16 14500 3.2716 45.7203 17.5962 24.1367 41.9119 244.4207
3.0613 5.34 15000 3.2697 45.775 17.6959 24.1867 42.0018 242.0381
3.0549 5.52 15500 3.2703 45.8193 17.686 24.1997 42.0109 242.5493
3.0725 5.69 16000 3.2655 45.3515 17.3438 24.0586 41.6126 240.2812
3.0728 5.87 16500 3.2671 45.6791 17.5028 24.0691 41.9219 250.455
3.0142 6.05 17000 3.2708 46.0287 17.8079 24.2916 42.2369 245.6204
3.0312 6.23 17500 3.2701 45.5731 17.5404 24.0925 41.7584 236.2234
3.0231 6.41 18000 3.2719 46.1094 17.7117 24.1117 42.2882 260.1686
3.0414 6.58 18500 3.2703 45.9178 17.6987 24.1882 42.1382 245.0961
3.0434 6.76 19000 3.2715 46.0129 17.7545 24.2235 42.245 247.8225
3.0456 6.94 19500 3.2682 45.8634 17.6462 24.1366 42.1194 256.9835
3.0188 7.12 20000 3.2752 45.8366 17.6771 24.165 42.0438 240.1866
3.0227 7.3 20500 3.2722 46.0509 17.8248 24.2389 42.2681 245.8337
2.9895 7.47 21000 3.2726 45.7896 17.5833 24.1226 42.016 243.867
3.0146 7.65 21500 3.2693 46.0179 17.6952 24.2204 42.2436 244.0598
3.014 7.83 22000 3.2708 46.0704 17.75 24.2308 42.2591 240.4804
3.0427 8.01 22500 3.2734 46.0662 17.7231 24.1915 42.2227 242.4203
2.9835 8.19 23000 3.2740 46.165 17.8947 24.366 42.3521 236.6266
2.987 8.36 23500 3.2719 45.9025 17.7625 24.2432 42.1257 238.479
2.9922 8.54 24000 3.2731 46.1971 17.7962 24.2279 42.3853 245.2081
2.9788 8.72 24500 3.2718 46.0806 17.8417 24.3261 42.264 240.1747
2.9878 8.9 25000 3.2715 46.0618 17.7725 24.2234 42.2574 242.5598

Framework versions

  • Transformers 4.20.0.dev0
  • Pytorch 1.11.0
  • Datasets 2.2.2
  • Tokenizers 0.12.1
Downloads last month
11
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train ratishsp/Centrum-multinews

Evaluation results