flan-t5-small-stacked-samsum-1024

This model is a fine-tuned version of google/flan-t5-small on the stacked-summaries/stacked-samsum-1024 dataset. It achieves the following results on the evaluation set:

Loss: 1.7573
Rouge1: 46.6072
Rouge2: 19.9754
Rougel: 35.2715
Rougelsum: 43.3599
Gen Len: 72.64

Model Description

Trained on a summarization task with potentially multiple doc-summary pairs stacked on top of each other.

You can separate its predictions by using it's special token [NEXT_CONCEPT] to split the output into "separate topics".

Intended use & limitations

This is intended to be used as a baseline/reference for comparison with the larger models.

Training and evaluation data

See stacked-summaries/stacked-samsum-1024.

Training Procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 16
eval_batch_size: 16
seed: 22138
distributed_type: multi-GPU
gradient_accumulation_steps: 8
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
1.9011	1.0	230	1.7986	45.4597	19.6956	34.6878	42.3724	74.16
1.8297	2.0	460	1.7609	46.0427	20.2299	35.2076	43.0549	70.56
1.7637	3.0	690	1.7573	46.6072	19.9754	35.2715	43.3599	72.64

stacked-summaries
/

flan-t5-small-stacked-samsum-1024