long-t5-tglobal-xl-16384-booksci-summary-v1

This model is a fine-tuned version of pszemraj/long-t5-tglobal-xl-16384-book-summary on the pszemraj/scientific_lay_summarisation-elife-norm dataset. It achieves the following results on the evaluation set:

Loss: 1.7518
Rouge1: 47.4591
Rouge2: 12.7287
Rougel: 21.5549
Rougelsum: 44.8709
Gen Len: 384.39

Model description

An experiment of further fine-tuning a booksum model on a different dataset. Compare to either the starting checkpoint (linked above) or to the variant only fine-tuned on the scientific lay summaries.

Intended uses & limitations

More information needed

Training and evaluation data

the pszemraj/scientific_lay_summarisation-elife-norm dataset, input 16384 tokens then truncate, output 1024 tokens then truncate.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 1
eval_batch_size: 1
seed: 878
gradient_accumulation_steps: 8
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.02
num_epochs: 2.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
1.9629	1.0	543	1.7637	46.6926	12.4769	21.4364	44.4329	381.23
1.8555	2.0	1086	1.7518	47.4591	12.7287	21.5549	44.8709	384.39

pszemraj
/

long-t5-tglobal-xl-16384-booksci-summary-v1