mrm8488's picture
license: apache-2.0
  - generated_from_trainer
  - samsum
  - rouge
  - name: switch-base-8-finetuned-samsum
      - task:
          name: Sequence-to-sequence Language Modeling
          type: text2text-generation
          name: samsum
          type: samsum
          config: samsum
          split: train
          args: samsum
          - name: Rouge1
            type: rouge
            value: 46.1297
  - text: |-
      Sid: Wanna catch a movie?
      Annie: sure what do you have in mind?
      Sid; the Aquaman? :D
      Annie: haha isn't it a bit childish
      Sid: noooooo I mean yes but it's the highest grossing movie this week
      Annie: seriously?
      Sid: yeah?
      Annie: okay let's see what the fuss is all about
  - text: |-
      Manu: What are you doing?
      Julien: CTO tasks
      Manu: Sounds boring...
      Julien: yes you know :S
      Manu: why don't you come home to see my pets?
      Julien: sounds like a plan!!!
      Manu: so, are you coming?
      Julien: it seems so... ;)

Switch Transformer (base-8) fine-tuned on samsum dataset for conversation summarization

This model is a fine-tuned version of google/switch-base-8 on the samsum dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4614
  • Rouge1: 46.1297
  • Rouge2: 22.9128
  • Rougel: 39.153
  • Rougelsum: 42.8502
  • Gen Len: 16.9719

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 6

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
1.8874 1.0 3683 1.5210 45.7651 22.9379 38.8554 42.6269 17.2482
1.6301 2.0 7366 1.4628 47.2719 24.8976 40.3913 43.9285 16.8362
1.4326 3.0 11049 1.4402 47.8275 25.2262 40.617 44.2948 16.9523
1.2992 4.0 14732 1.4489 48.393 25.3888 40.9534 44.797 17.1504
1.2259 5.0 18415 1.4495 49.2186 26.312 41.721 45.5087 17.1956
1.1477 6.0 22098 1.4610 49.0018 26.3474 41.5217 45.4081 17.0782

Framework versions

  • Transformers 4.25.1
  • Pytorch 1.13.0+cu116
  • Datasets 2.8.0
  • Tokenizers 0.13.2