Edit model card

long-t5-tglobal-base-16384 + BookSum


  • summarize long text and get a SparkNotes-esque summary of arbitrary topics!
  • generalizes reasonably well to academic & narrative text.
  • A simple example/use case on ASR is here. There's also an example notebook in Colab (click on the icon above).

Cheeky Proof-of-Concept

A summary of the infamous navy seals copypasta:

The narrator tells us that he's graduated from the Navy seals and has been involved in many secret raids. He's also one of the best snipers in the entire U.S. military. He promises to "wipe you out with precision" when they meet again.

Model description

A fine-tuned version of google/long-t5-tglobal-base on the kmfoda/booksum dataset:

  • 30+ epochs of fine-tuning from the base model on V100/A100 GPUs
  • all training used 16384 token input / 1024 max output

Read the paper by Guo et al. here: LongT5: Efficient Text-To-Text Transformer for Long Sequences

How-To in Python

Install/update transformers pip install -U transformers

Summarize text with pipeline:

import torch
from transformers import pipeline

summarizer = pipeline(
    device=0 if torch.cuda.is_available() else -1,
long_text = "Here is a lot of text I don't want to read. Replace me"

result = summarizer(long_text)

Pass other parameters related to beam search textgen when calling summarizer to get even higher quality results.

Intended uses & limitations

  • The current checkpoint is fairly well converged but will be updated if further improvements can be made.
    • Compare performance to LED-base trained on the same dataset (API gen parameters are the same).
  • while this model seems to improve upon factual consistency, do not take summaries to be foolproof and check things that seem odd.

Training and evaluation data

kmfoda/booksum dataset on HuggingFace - read the original paper here. Summaries longer than 1024 LongT5 tokens were filtered out to prevent the model from learning to generate "partial" summaries.

NOTE: early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of 1024 characters. This was subsequently caught and adjusted to 1024 tokens and then trained further for 10+ epochs.

Training procedure


  • July 22, 2022: updated to a fairly converged checkpoint
  • July 3, 2022: Added a new version with several epochs of additional training that is more performant in general.

Training hyperparameters

The following hyperparameters were used during the most recent training round*:

  • learning_rate: 0.0005
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 128
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.01
  • num_epochs: 2

*Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train

Training results

Framework versions

  • Transformers 4.20.1
  • Pytorch 1.10.0+cu113
  • Datasets 2.3.2
  • Tokenizers 0.12.1
Downloads last month
Hosted inference API
This model can be loaded on the Inference API on-demand.

Dataset used to train pszemraj/long-t5-tglobal-base-16384-book-summary

Evaluation results