Edit model card


Open In Colab

This model is a fine-tuned version of allenai/led-large-16384 on the BookSum dataset (kmfoda/booksum). It aims to generalize well and be useful in summarizing lengthy text for both academic and everyday purposes.

  • Handles up to 16,384 tokens input
  • See the Colab demo linked above or try the demo on Spaces

Note: Due to inference API timeout constraints, outputs may be truncated before the fully summary is returned (try python or the demo)

Basic Usage

To improve summary quality, use encoder_no_repeat_ngram_size=3 when calling the pipeline object. This setting encourages the model to utilize new vocabulary and construct an abstractive summary.

Load the model into a pipeline object:

import torch
from transformers import pipeline

hf_name = 'pszemraj/led-large-book-summary'

summarizer = pipeline(
    device=0 if torch.cuda.is_available() else -1,

Feed the text into the pipeline object:

wall_of_text = "your words here"

result = summarizer(

Important: For optimal summary quality, use the global attention mask when decoding, as demonstrated in this community notebook, see the definition of generate_answer(batch).

If you're facing computing constraints, consider using the base version pszemraj/led-base-book-summary.

Training Information


The model was fine-tuned on the booksum dataset. During training, the chapterwas the input col, while the summary_text was the output.


Fine-tuning was run on the BookSum dataset across 13+ epochs. Notably, the final four epochs combined the training and validation sets as 'train' to enhance generalization.


The training process involved different settings across stages:

  • Initial Three Epochs: Low learning rate (5e-05), batch size of 1, 4 gradient accumulation steps, and a linear learning rate scheduler.
  • In-between Epochs: Learning rate reduced to 4e-05, increased batch size to 2, 16 gradient accumulation steps, and switched to a cosine learning rate scheduler with a 0.05 warmup ratio.
  • Final Two Epochs: Further reduced learning rate (2e-05), batch size reverted to 1, maintained gradient accumulation steps at 16, and continued with a cosine learning rate scheduler, albeit with a lower warmup ratio (0.03).


  • Transformers 4.19.2
  • Pytorch 1.11.0+cu113
  • Datasets 2.2.2
  • Tokenizers 0.12.1

Simplified Usage with TextSum

To streamline the process of using this and other models, I've developed a Python package utility named textsum. This package offers simple interfaces for applying summarization models to text documents of arbitrary length.

Install TextSum:

pip install textsum

Then use it in Python with this model:

from textsum.summarize import Summarizer

model_name = "pszemraj/led-large-book-summary"
summarizer = Summarizer(
    model_name_or_path=model_name,  # you can use any Seq2Seq model on the Hub
    token_batch_length=4096,  # tokens to batch summarize at a time, up to 16384
long_string = "This is a long string of text that will be summarized."
out_str = summarizer.summarize_string(long_string)
print(f"summary: {out_str}")

Currently implemented interfaces include a Python API, a Command-Line Interface (CLI), and a demo/web UI.

For detailed explanations and documentation, check the README or the wiki

Related Models

Check out these other related models, also trained on the BookSum dataset:

There are also other variants on other datasets etc on my hf profile, feel free to try them out :)

Downloads last month
Model size
460M params
Tensor type

Dataset used to train pszemraj/led-large-book-summary

Spaces using pszemraj/led-large-book-summary 13

Collection including pszemraj/led-large-book-summary

Evaluation results