Centrum-multinews
This model is a fine-tuned version of Centrum on the multi_news dataset. The details of the model are mentioned in the preprint Multi-Document Summarization with Centroid-Based Pretraining (Ratish Puduppully and Mark Steedman). It achieves the following results on the evaluation set:
- Loss: 3.2740
- Rouge1: 46.2987
- Rouge2: 18.4863
- Rougel: 24.2428
- Rougelsum: 42.5102
- Gen Len: 308.6606
Model description
The script for training and inference of Centrum-multinews is available on https://github.com/ratishsp/centrum
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 1
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2500
- training_steps: 25000
- mixed_precision_training: Native AMP
- label_smoothing_factor: 0.1
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
3.2702 | 1.78 | 5000 | 3.2853 | 44.0203 | 16.6061 | 23.3846 | 40.3853 | 277.1855 |
3.2762 | 1.96 | 5500 | 3.2853 | 44.725 | 16.9262 | 23.475 | 41.0003 | 288.4173 |
3.2114 | 2.14 | 6000 | 3.2857 | 44.6456 | 17.0245 | 23.7328 | 40.9131 | 257.2761 |
3.1981 | 2.31 | 6500 | 3.2817 | 44.7869 | 17.0849 | 23.8372 | 41.0669 | 254.8618 |
3.2298 | 2.49 | 7000 | 3.2802 | 45.2657 | 17.2618 | 23.8204 | 41.5807 | 263.0854 |
3.2167 | 2.67 | 7500 | 3.2773 | 44.9516 | 17.0538 | 23.7894 | 41.1673 | 244.6939 |
3.2069 | 2.85 | 8000 | 3.2712 | 45.2153 | 17.2766 | 23.9883 | 41.4558 | 245.4036 |
3.1822 | 3.02 | 8500 | 3.2786 | 45.4747 | 17.6754 | 24.1878 | 41.7304 | 254.6624 |
3.1529 | 3.2 | 9000 | 3.2740 | 44.9033 | 17.1386 | 23.8511 | 41.177 | 246.0157 |
3.1407 | 3.38 | 9500 | 3.2704 | 45.1045 | 17.2335 | 23.9124 | 41.3243 | 243.4922 |
3.1376 | 3.56 | 10000 | 3.2721 | 45.2694 | 17.4797 | 24.1072 | 41.5441 | 243.8396 |
3.1545 | 3.74 | 10500 | 3.2720 | 45.3105 | 17.6338 | 24.1547 | 41.5731 | 231.1805 |
3.1307 | 3.91 | 11000 | 3.2684 | 45.4309 | 17.2665 | 23.8954 | 41.6518 | 250.1039 |
3.1022 | 4.09 | 11500 | 3.2719 | 45.1959 | 17.4017 | 24.056 | 41.5363 | 242.5923 |
3.1139 | 4.27 | 12000 | 3.2711 | 45.3864 | 17.4653 | 24.028 | 41.6797 | 240.5701 |
3.0978 | 4.45 | 12500 | 3.2722 | 45.5694 | 17.501 | 24.1452 | 41.7894 | 232.1149 |
3.1082 | 4.63 | 13000 | 3.2687 | 45.504 | 17.5137 | 24.1067 | 41.7686 | 245.1845 |
3.1059 | 4.8 | 13500 | 3.2686 | 45.3603 | 17.1619 | 23.8655 | 41.5953 | 248.6327 |
3.1141 | 4.98 | 14000 | 3.2658 | 45.2741 | 17.3814 | 24.0377 | 41.5263 | 234.0194 |
3.0294 | 5.16 | 14500 | 3.2716 | 45.7203 | 17.5962 | 24.1367 | 41.9119 | 244.4207 |
3.0613 | 5.34 | 15000 | 3.2697 | 45.775 | 17.6959 | 24.1867 | 42.0018 | 242.0381 |
3.0549 | 5.52 | 15500 | 3.2703 | 45.8193 | 17.686 | 24.1997 | 42.0109 | 242.5493 |
3.0725 | 5.69 | 16000 | 3.2655 | 45.3515 | 17.3438 | 24.0586 | 41.6126 | 240.2812 |
3.0728 | 5.87 | 16500 | 3.2671 | 45.6791 | 17.5028 | 24.0691 | 41.9219 | 250.455 |
3.0142 | 6.05 | 17000 | 3.2708 | 46.0287 | 17.8079 | 24.2916 | 42.2369 | 245.6204 |
3.0312 | 6.23 | 17500 | 3.2701 | 45.5731 | 17.5404 | 24.0925 | 41.7584 | 236.2234 |
3.0231 | 6.41 | 18000 | 3.2719 | 46.1094 | 17.7117 | 24.1117 | 42.2882 | 260.1686 |
3.0414 | 6.58 | 18500 | 3.2703 | 45.9178 | 17.6987 | 24.1882 | 42.1382 | 245.0961 |
3.0434 | 6.76 | 19000 | 3.2715 | 46.0129 | 17.7545 | 24.2235 | 42.245 | 247.8225 |
3.0456 | 6.94 | 19500 | 3.2682 | 45.8634 | 17.6462 | 24.1366 | 42.1194 | 256.9835 |
3.0188 | 7.12 | 20000 | 3.2752 | 45.8366 | 17.6771 | 24.165 | 42.0438 | 240.1866 |
3.0227 | 7.3 | 20500 | 3.2722 | 46.0509 | 17.8248 | 24.2389 | 42.2681 | 245.8337 |
2.9895 | 7.47 | 21000 | 3.2726 | 45.7896 | 17.5833 | 24.1226 | 42.016 | 243.867 |
3.0146 | 7.65 | 21500 | 3.2693 | 46.0179 | 17.6952 | 24.2204 | 42.2436 | 244.0598 |
3.014 | 7.83 | 22000 | 3.2708 | 46.0704 | 17.75 | 24.2308 | 42.2591 | 240.4804 |
3.0427 | 8.01 | 22500 | 3.2734 | 46.0662 | 17.7231 | 24.1915 | 42.2227 | 242.4203 |
2.9835 | 8.19 | 23000 | 3.2740 | 46.165 | 17.8947 | 24.366 | 42.3521 | 236.6266 |
2.987 | 8.36 | 23500 | 3.2719 | 45.9025 | 17.7625 | 24.2432 | 42.1257 | 238.479 |
2.9922 | 8.54 | 24000 | 3.2731 | 46.1971 | 17.7962 | 24.2279 | 42.3853 | 245.2081 |
2.9788 | 8.72 | 24500 | 3.2718 | 46.0806 | 17.8417 | 24.3261 | 42.264 | 240.1747 |
2.9878 | 8.9 | 25000 | 3.2715 | 46.0618 | 17.7725 | 24.2234 | 42.2574 | 242.5598 |
Framework versions
- Transformers 4.20.0.dev0
- Pytorch 1.11.0
- Datasets 2.2.2
- Tokenizers 0.12.1
- Downloads last month
- 11
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.