metadata

datasets:
  - multi_news
metrics:
  - bleu
  - rouge
pipeline_tag: summarization

Hyperparameters

learning_rate=2e-5
per_device_train_batch_size=14
per_device_eval_batch_size=14
weight_decay=0.01
save_total_limit=3
num_train_epochs=3
predict_with_generate=True
fp16=True

Training Output

global_step=7710
training_loss=2.1297076629757417
metrics={'train_runtime': 6059.0418, 
'train_samples_per_second': 17.813, 
'train_steps_per_second': 1.272, 
'total_flos': 2.3389776681055027e+17, 
'train_loss': 2.1297076629757417, 
'epoch': 3.0}

Training Results

Epoch	Training Loss	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Bleu	Gen Len
1	2.223100	2.038599	0.147400	0.054800	0.113500	0.113500	0.001400	20.000000
2	2.078100	2.009619	0.152900	0.057800	0.117000	0.117000	0.001600	20.000000
3	1.989000	2.006006	0.152900	0.057300	0.116700	0.116700	0.001700	20.000000