taehyunzzz's picture
Model save
979c062 verified
metadata
tags:
  - generated_from_trainer
datasets:
  - samsum
metrics:
  - rouge
model-index:
  - name: switch-base-32-samsum-ba16-lr1e-04-top-4-choose-1-res-phase2-budget3-dim1
    results:
      - task:
          name: Sequence-to-sequence Language Modeling
          type: text2text-generation
        dataset:
          name: samsum
          type: samsum
          config: samsum
          split: validation
          args: samsum
        metrics:
          - name: Rouge1
            type: rouge
            value: 50.511

switch-base-32-samsum-ba16-lr1e-04-top-4-choose-1-res-phase2-budget3-dim1

This model was trained from scratch on the samsum dataset. It achieves the following results on the evaluation set:

  • Loss: 1.8163
  • Rouge1: 50.511
  • Rouge2: 26.0947
  • Rougel: 42.4175
  • Rougelsum: 46.4756
  • Gen Len: 20.522

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
1.2114 0.5429 500 1.6695 49.8655 25.6608 42.0018 46.1475 20.4132
1.1553 1.0858 1000 1.7089 50.2875 25.9243 42.157 46.4898 22.3178
1.1419 1.6287 1500 1.6890 50.7227 26.5404 42.6219 46.9542 21.0575
1.0082 2.1716 2000 1.7140 51.0857 26.9422 42.9033 47.4713 21.6002
1.057 2.7144 2500 1.7156 50.6415 26.6621 42.6293 46.728 21.6333
0.9098 3.2573 3000 1.7776 51.1518 27.178 43.2364 47.3776 21.2433
0.993 3.8002 3500 1.7702 50.9856 26.6895 42.0314 46.9763 22.6919
0.8361 4.3431 4000 1.8436 50.4271 25.8178 42.3022 46.5182 21.9022
0.9078 4.8860 4500 1.8163 50.511 26.0947 42.4175 46.4756 20.522

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1