---
datasets:
- multi_news
metrics:
- bleu
- rouge
pipeline_tag: summarization
---

# Hyperparameters
    learning_rate=2e-5
    per_device_train_batch_size=14
    per_device_eval_batch_size=14
    weight_decay=0.01
    save_total_limit=3
    num_train_epochs=3
    predict_with_generate=True
    fp16=True

# Training Output
    global_step=7710,
    training_loss=2.436398018566087,
    metrics={'train_runtime': 30287.1254,
    'train_samples_per_second': 3.564,
    'train_steps_per_second': 0.255,
    'total_flos': 3.1186278368988365e+17,
    'train_loss': 2.436398018566087,
    'epoch': 3.0}

# Training Results

| Epoch	| Training Loss | Validation Loss |  Rouge1  |  Rouge2	|  Rougel  | Rougelsum |   Bleu	  |  Gen Len  |
|:----- |:------------  |:--------------- |:-------- | :------- |:-------- |:--------- |:-------- |:--------- |
1|	2.451200|	2.291708|	0.322800|	0.110100|	0.194600|	0.194700|	0.368400|	150.224300
2|	2.527300|	nan|	0.296400|	0.100100|	0.181800|	0.181900	|0.317300|	137.569200
3|	2.523800|	nan	|0.296600|	0.100000|	0.181800	|0.181900	|0.317200|	137.254000