pszemraj's picture
Update README.md
ec8ddc3
|
raw
history blame
1.93 kB
metadata
license: apache-2.0
datasets:
  - stacked-summaries/stacked-samsum-1024
language:
  - en
metrics:
  - rouge
tags:
  - stacked summaries
  - samsum

flan-t5-small-stacked-samsum-1024

This model is a fine-tuned version of google/flan-t5-small on the stacked-summaries/stacked-samsum-1024 dataset. It achieves the following results on the evaluation set:

  • Loss: 1.7573
  • Rouge1: 46.6072
  • Rouge2: 19.9754
  • Rougel: 35.2715
  • Rougelsum: 43.3599
  • Gen Len: 72.64

Model description

Trained on a summarization task with potentially several doc-summary pairs stacked on top of each other.

You can separate it's predictions by using it's special token [NEXT_CONCEPT] to split the outputs in "separate topics".

Intended uses & limitations

  • this is meant to be used as a baseline/reference for comparison to the larger models

Training and evaluation data

See stacked-summaries/stacked-samsum-1024

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 22138
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
1.9011 1.0 230 1.7986 45.4597 19.6956 34.6878 42.3724 74.16
1.8297 2.0 460 1.7609 46.0427 20.2299 35.2076 43.0549 70.56
1.7637 3.0 690 1.7573 46.6072 19.9754 35.2715 43.3599 72.64