Edit model card

flan-t5-small-stacked-samsum-1024

This model is a fine-tuned version of google/flan-t5-small on the stacked-summaries/stacked-samsum-1024 dataset. It achieves the following results on the evaluation set:

  • Loss: 1.7573
  • Rouge1: 46.6072
  • Rouge2: 19.9754
  • Rougel: 35.2715
  • Rougelsum: 43.3599
  • Gen Len: 72.64

Model Description

Trained on a summarization task with potentially multiple doc-summary pairs stacked on top of each other.

You can separate its predictions by using it's special token [NEXT_CONCEPT] to split the output into "separate topics".

Intended use & limitations

  • This is intended to be used as a baseline/reference for comparison with the larger models.

Training and evaluation data

See stacked-summaries/stacked-samsum-1024.

Training Procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 22138
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
1.9011 1.0 230 1.7986 45.4597 19.6956 34.6878 42.3724 74.16
1.8297 2.0 460 1.7609 46.0427 20.2299 35.2076 43.0549 70.56
1.7637 3.0 690 1.7573 46.6072 19.9754 35.2715 43.3599 72.64
Downloads last month
12
Safetensors
Model size
77M params
Tensor type
F32
·

Dataset used to train stacked-summaries/flan-t5-small-stacked-samsum-1024