Edit model card

mt5-small-finetuned-xlsum-en-es

This model is a fine-tuned version of google/mt5-small on the csebuetnlp/xlsum dataset.

Reduced versions of the English/Spanish subsets were used, focusing on shorter targets.

It achieves the following results on the evaluation set:

  • Loss: 2.9483
  • Rouge1: 19.42
  • Rouge2: 4.44
  • Rougel: 16.7
  • Rougelsum: 16.7
  • Mean Len: 16.3231

Model description

More information needed

Intended uses & limitations

Model may produce false information when summarizing.

This is very much an initial draft, and is not expected for use in production, use at your own risk.

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Lead-3 Baseline:

  • Rouge1: 12.22
  • Rouge2: 2.01
  • RougeL: 9.02
  • RougeLsum: 10.33
Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Mean Len
6.7763 1.0 1237 3.1120 13.57 2.76 11.59 11.59 12.6116
4.1022 2.0 2474 2.9718 19.35 4.32 16.63 16.64 16.3084
3.9219 3.0 3711 2.9483 19.42 4.44 16.7 16.7 16.3231

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2

Citation

BibTeX:

@inproceedings{hasan-etal-2021-xl,
    title = "{XL}-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages",
    author = "Hasan, Tahmid  and
      Bhattacharjee, Abhik  and
      Islam, Md. Saiful  and
      Mubasshir, Kazi  and
      Li, Yuan-Fang  and
      Kang, Yong-Bin  and
      Rahman, M. Sohel  and
      Shahriyar, Rifat",
    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-acl.413",
    pages = "4693--4703",
}
Downloads last month
10
Safetensors
Model size
300M params
Tensor type
F32
·

Finetuned from

Dataset used to train alex-atelo/mt5-small-finetuned-xlsum-en-es