---
license: apache-2.0
---

GPT-2 fine-tuned on CNN/DM summarization dataset.

Training args:\
{
"learning_rate": 0.0001\
"logging_steps": 5000\
"lr_scheduler_type": "cosine"\
"num_train_epochs": 2\
"per_device_train_batch_size": 12, # Total batch size: 36\
"weight_decay": 0.1\
}

{"generation_kwargs": {"do_sample": true, "max_new_tokens": 100, "min_length": 50}

Pre-processing to truncate the article to contain only 500 tokens.
Post-processing to consider only first three sentences as the summary.

Test split metrics:

Meteor: 0.2562237219960531\
Rouge1: 0.3754558158439447\
Rouge2: 0.15532626375157227\
RougeL: 0.25813023509572597\
RougeLsum: 0.3489472885043494\
BLEU: 0.09285941365815623\
Bert_score: 0.87570951795246\