bigbird pegasus on the booksum dataset- 40,000 steps

the fully-tuned model can be found here. This checkpoint will stay live because the summarization is almost as good and compute is way faster.

  • typical datasets for summarization models arePubMed / arXiv; for my use cases I have found these to be not very useful
    • summarizing text via arXiv models will typically make the summary sound so needlessly complicated you might as well have read the original text in that time anyway.
    • this model is one attempt to help with that
  • this is not a finished checkpoint but WIP:
    • 40K steps or 4 epochs trained on the booksum dataset (have gone through ~60-70% of the training dataset).
    • Note that while the model started from continues to be able to apply the attention mechanism at 4096 tokens, it was trained with the dataset tokenized to a max_length of 1024 for GPU memory reasons.
    • Will continue to improve based on any result findings/feedback.
  • the starting checkpoint was google/bigbird-pegasus-large-bigpatent

example usage

An extended example, including a demo of batch summarization, is here.

  • create the summarizer object:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from transformers import pipeline

_model = AutoModelForSeq2SeqLM.from_pretrained(

_tokenizer = AutoTokenizer.from_pretrained(

summarizer = pipeline(
  • define text to be summarized, and pass it through the pipeline. Boom done.
wall_of_text = "your text to be summarized goes here."

result = summarizer(



  • below are scores from running evaluation on the entire Validation set, around ~1400 rows.
  • note that while the dataset has three subsets (chapter, book, paragraph) - see the paper. the scores below are run in aggregate
  • seems that these scores are on par / slightly better than what was reported in the paper, still more validation and other work to do.
    "eval_gen_len": 126.5815,
    "eval_loss": 3.747079610824585,
    "eval_rouge1": 30.4775,
    "eval_rouge2": 4.8919,
    "eval_rougeL": 16.742,
    "eval_rougeLsum": 27.57,
    "eval_runtime": 4246.9369,
    "eval_samples_per_second": 0.349,
    "eval_steps_per_second": 0.349
