metadata

license: apache-2.0
datasets:
  - stacked-summaries/stacked-samsum-1024
language:
  - en
metrics:
  - rouge
tags:
  - stacked summaries
  - samsum

flan-t5-small-stacked-samsum-1024

This model is a fine-tuned version of google/flan-t5-small on the stacked-summaries/stacked-samsum-1024 dataset. It achieves the following results on the evaluation set:

Loss: 1.7573
Rouge1: 46.6072
Rouge2: 19.9754
Rougel: 35.2715
Rougelsum: 43.3599
Gen Len: 72.64

Model description

Trained on a summarization task with potentially several doc-summary pairs stacked on top of each other.

You can separate it's predictions by using it's special token [NEXT_CONCEPT] to split the outputs in "separate topics".

Intended uses & limitations

this is meant to be used as a baseline/reference for comparison to the larger models

Training and evaluation data

See stacked-summaries/stacked-samsum-1024

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 16
eval_batch_size: 16
seed: 22138
distributed_type: multi-GPU
gradient_accumulation_steps: 8
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
1.9011	1.0	230	1.7986	45.4597	19.6956	34.6878	42.3724	74.16
1.8297	2.0	460	1.7609	46.0427	20.2299	35.2076	43.0549	70.56
1.7637	3.0	690	1.7573	46.6072	19.9754	35.2715	43.3599	72.64