Edit model card

bart-base-arxiv-1024

This model is a fine-tuned version of facebook/bart-base on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 4.4204
  • Rouge1: 42.7148
  • Rouge2: 14.9393
  • Rougel: 23.8135
  • Rougelsum: 38.2094
  • Gen Len: 152.94

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0008
  • train_batch_size: 16
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: polynomial
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 4
  • mixed_precision_training: Native AMP
  • label_smoothing_factor: 0.2

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
5.0132 0.16 500 4.9526 36.6006 11.6516 21.2812 32.3855 102.5
4.9026 0.32 1000 4.8487 37.0279 11.9575 21.6566 33.3527 98.72
4.8134 0.47 1500 4.8093 38.6789 12.0964 21.5679 34.4042 131.17
4.7615 0.63 2000 4.7357 38.9948 12.6214 21.5771 34.6947 120.84
4.7316 0.79 2500 4.6984 39.0043 13.692 22.4767 34.7406 114.67
4.6984 0.95 3000 4.6661 37.7638 13.0795 21.9015 34.1801 112.11
4.6423 1.1 3500 4.6413 40.2953 13.9014 22.6632 35.9141 127.43
4.6166 1.26 4000 4.6175 40.99 14.3428 23.4201 36.6002 133.73
4.5878 1.42 4500 4.6042 40.8889 14.0993 23.0454 36.9924 141.88
4.5874 1.58 5000 4.5846 39.9072 14.2083 22.8314 35.9495 123.82
4.5642 1.73 5500 4.5687 40.4716 14.1263 22.6271 36.2139 137.2
4.555 1.89 6000 4.5551 41.3314 14.232 22.8318 37.1038 148.78
4.4763 2.05 6500 4.5433 41.7555 14.6625 23.7076 37.705 142.12
4.4687 2.21 7000 4.5232 41.226 14.6976 23.0482 36.7016 133.7
4.4737 2.37 7500 4.5128 40.0649 13.9868 23.1803 35.9016 122.17
4.4634 2.52 8000 4.4999 42.5774 15.4706 23.4321 38.212 137.87
4.4443 2.68 8500 4.4829 41.7603 15.1096 23.5735 37.5121 147.78
4.4409 2.84 9000 4.4757 41.9056 14.7477 23.1478 37.6321 142.64
4.4271 3.0 9500 4.4642 41.7456 15.1452 23.4016 37.6441 138.98
4.3629 3.15 10000 4.4569 41.5637 15.0198 23.0226 37.22 148.2
4.3489 3.31 10500 4.4502 42.0897 14.7576 23.1048 37.6046 142.48
4.3377 3.47 11000 4.4403 43.3032 15.4076 23.665 39.0579 156.0
4.339 3.63 11500 4.4329 42.6232 15.0481 23.6074 37.637 151.36
4.3423 3.78 12000 4.4272 42.565 14.9409 23.2332 38.1214 154.21
4.3269 3.94 12500 4.4204 42.7148 14.9393 23.8135 38.2094 152.94

Framework versions

  • Transformers 4.37.2
  • Pytorch 2.0.1+cu117
  • Datasets 2.14.4
  • Tokenizers 0.15.2
Downloads last month
0
Safetensors
Model size
139M params
Tensor type
F32
·

Finetuned from