flan-t5-large-samsum

This model is a fine-tuned version of google/flan-t5-large on the samsum dataset.

It achieves the following results on the evaluation set:

Loss: 1.1754
Rouge1: 54.1595
Rouge2: 29.1081
Rougel: 45.4989
Rougelsum: 49.1026
Gen Len: 28.72

Note: the stacked version of this model technically does evaluation on a different validation set (the stacked one) while this just uses samsum.

Model description

More information needed

Intended uses & limitations

Intended for comparison(s) to the stacked version of this model
1024 token input max

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 4
seed: 17868
distributed_type: multi-GPU
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.04
num_epochs: 5.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
1.2106	0.43	50	1.1889	52.5898	26.9967	43.6944	47.9656	24.5167
1.213	0.87	100	1.1760	52.4279	27.4689	43.7873	48.0581	25.0533
1.0726	1.3	150	1.1731	52.8246	26.9524	43.7429	48.0345	25.55
1.0784	1.74	200	1.1708	53.1291	27.9056	44.2609	48.6883	26.03
1.0215	2.17	250	1.1755	53.1512	27.9475	44.1442	48.4619	27.57
1.0294	2.61	300	1.1711	53.4402	28.0126	44.5454	48.6432	25.9033
1.0016	3.04	350	1.1718	53.9395	28.3087	45.191	49.2773	26.6133
0.9576	3.48	400	1.1741	53.9004	28.3243	45.0911	48.9182	26.33
0.9739	3.91	450	1.1754	53.7049	28.419	44.8946	48.8708	27.2433
0.9505	4.35	500	1.1781	53.7142	28.1758	44.8324	48.9498	26.8667
0.9993	4.78	550	1.1784	53.87	28.2211	44.893	49.1074	27.2167

stacked-summaries
/

flan-t5-large-samsum