taehyunzzz's picture
Model save
e74fb2f verified
metadata
license: apache-2.0
base_model: google/switch-base-8
tags:
  - generated_from_trainer
datasets:
  - samsum
metrics:
  - rouge
model-index:
  - name: switch-base-8-samsum-top-4-choose-1-deconly
    results:
      - task:
          name: Sequence-to-sequence Language Modeling
          type: text2text-generation
        dataset:
          name: samsum
          type: samsum
          config: samsum
          split: validation
          args: samsum
        metrics:
          - name: Rouge1
            type: rouge
            value: 47.2666

switch-base-8-samsum-top-4-choose-1-deconly

This model is a fine-tuned version of google/switch-base-8 on the samsum dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5869
  • Rouge1: 47.2666
  • Rouge2: 24.2196
  • Rougel: 40.1766
  • Rougelsum: 43.8418
  • Gen Len: 16.9352

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 5.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
5.4611 0.2172 200 3.0917 23.5686 7.6846 20.6877 22.0746 14.7946
2.6551 0.4343 400 2.1027 39.7231 17.2476 33.3172 37.0509 17.1369
2.4452 0.6515 600 1.9255 42.9952 19.6478 35.8054 40.1569 17.3007
2.1259 0.8686 800 1.8270 43.9723 21.3238 37.0066 40.9323 16.1027
2.0957 1.0858 1000 1.7708 45.1103 21.769 37.9229 41.7446 17.2482
2.1168 1.3029 1200 1.7185 45.6806 22.0335 38.2398 42.4051 16.5941
2.1491 1.5201 1400 1.6982 46.0573 22.2803 38.33 42.531 16.9291
1.9829 1.7372 1600 1.6803 45.8845 22.4145 38.795 42.5814 16.4976
1.9741 1.9544 1800 1.6657 45.6645 22.0154 38.2445 42.2358 17.2689
1.8286 2.1716 2000 1.6462 46.7647 23.2912 39.4015 43.3207 16.8704
1.8177 2.3887 2200 1.6486 45.8872 22.8119 38.7398 42.3427 16.0403
1.8606 2.6059 2400 1.6270 45.9799 22.9475 38.9393 42.7565 16.6687
1.8327 2.8230 2600 1.6210 46.2715 23.4171 39.4324 43.0326 16.5452
1.6738 3.0402 2800 1.6242 46.1248 22.7245 38.8572 42.5884 16.8252
1.7515 3.2573 3000 1.6155 46.5372 23.4014 39.54 43.187 16.665
1.7728 3.4745 3200 1.6000 46.6652 23.4739 39.4761 43.2783 16.7873
1.7584 3.6916 3400 1.5922 47.2313 24.0035 39.9195 43.6996 16.7702
1.7082 3.9088 3600 1.5957 46.5132 23.4692 39.4884 43.2236 16.6553
1.5968 4.1260 3800 1.5916 47.2622 23.9444 40.1308 43.7971 16.9083
1.6439 4.3431 4000 1.5880 46.9607 23.7839 39.7431 43.5831 16.9621
1.6684 4.5603 4200 1.5930 47.2611 23.9828 40.0767 43.8297 16.8851
1.7749 4.7774 4400 1.5882 46.9562 23.874 39.8904 43.536 16.9377
1.6401 4.9946 4600 1.5869 47.2666 24.2196 40.1766 43.8418 16.9352

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.0.1+cu117
  • Datasets 2.20.0
  • Tokenizers 0.19.1