pszemraj's picture
Add evaluation results on the default config of billsum (#6)
f52be04
|
raw
history blame
4.21 kB
metadata
tags:
  - summarization
  - summary
  - booksum
  - long-document
  - long-form
license: apache-2.0
datasets:
  - kmfoda/booksum
metrics:
  - rouge
inference: false
model-index:
  - name: pszemraj/long-t5-tglobal-large-pubmed-3k-booksum-16384-WIP
    results:
      - task:
          type: summarization
          name: Summarization
        dataset:
          name: kmfoda/booksum
          type: kmfoda/booksum
          config: kmfoda--booksum
          split: test
        metrics:
          - name: ROUGE-1
            type: rouge
            value: 35.9969
            verified: true
          - name: ROUGE-2
            type: rouge
            value: 5.9272
            verified: true
          - name: ROUGE-L
            type: rouge
            value: 16.0136
            verified: true
          - name: ROUGE-LSUM
            type: rouge
            value: 32.941
            verified: true
          - name: loss
            type: loss
            value: 2.9339466094970703
            verified: true
          - name: gen_len
            type: gen_len
            value: 283.7198
            verified: true
      - task:
          type: summarization
          name: Summarization
        dataset:
          name: samsum
          type: samsum
          config: samsum
          split: test
        metrics:
          - name: ROUGE-1
            type: rouge
            value: 26.2412
            verified: true
          - name: ROUGE-2
            type: rouge
            value: 5.9791
            verified: true
          - name: ROUGE-L
            type: rouge
            value: 18.7467
            verified: true
          - name: ROUGE-LSUM
            type: rouge
            value: 22.5566
            verified: true
          - name: loss
            type: loss
            value: 2.877626895904541
            verified: true
          - name: gen_len
            type: gen_len
            value: 47.6532
            verified: true
      - task:
          type: summarization
          name: Summarization
        dataset:
          name: xsum
          type: xsum
          config: default
          split: test
        metrics:
          - name: ROUGE-1
            type: rouge
            value: 19.3209
            verified: true
          - name: ROUGE-2
            type: rouge
            value: 2.7978
            verified: true
          - name: ROUGE-L
            type: rouge
            value: 12.5816
            verified: true
          - name: ROUGE-LSUM
            type: rouge
            value: 15.0239
            verified: true
          - name: loss
            type: loss
            value: 4.483709335327148
            verified: true
          - name: gen_len
            type: gen_len
            value: 82.729
            verified: true
      - task:
          type: summarization
          name: Summarization
        dataset:
          name: billsum
          type: billsum
          config: default
          split: test
        metrics:
          - name: ROUGE-1
            type: rouge
            value: 36.5688
            verified: true
          - name: ROUGE-2
            type: rouge
            value: 12.5849
            verified: true
          - name: ROUGE-L
            type: rouge
            value: 22.2461
            verified: true
          - name: ROUGE-LSUM
            type: rouge
            value: 30.6507
            verified: true
          - name: loss
            type: loss
            value: 2.6456267833709717
            verified: true
          - name: gen_len
            type: gen_len
            value: 139.0398
            verified: true

long-t5-tglobal-large-pubmed-3k-booksum-16384-WIP

NOTE: this is still a work-in-progress (WIP) and not completed/converged by any means, but sharing to maybe save some time for others :)

Updates

As I update this WIP checkpoint, I will post a note here.

  • July 26, 2022: add two more epochs of training, metrics starting to be almost as good as the more-tuned base variant
  • July 8, 2022: add checkpoint with ~4 epochs of training on A100, equating to approx 350 steps of functional batch size 128
  • July 4, 2022: add checkpoint with six additional epochs of training with the dataset summary outputs filtered to 1024 tokens, resolving the prior issue of short summaries.

About

  • a checkpoint of Stancld/longt5-tglobal-large-16384-pubmed-3k_steps trained on kmfoda/booksum for about 26 epochs
  • max input lengths during training vary between 8192 and 16384 tokens depending on GPU availability. This checkpoint was trained with 16384 tokens as the max input length for the final 10+ epochs

Comparisons