Edit model card
  • generalizes reasonably well to academic & narrative text.


Model description

A fine-tuned version of google/long-t5-tglobal-base on the booksum dataset:

  • 30+ epochs of fine-tuning from the base model on V100/A100 GPUs
  • Training used 16384 token input / 1024 max output

Read the paper by Guo et al. here: LongT5: Efficient Text-To-Text Transformer for Long Sequences

How-To in Python

Install/update transformers pip install -U transformers

Summarize text with pipeline:

import torch
from transformers import pipeline

summarizer = pipeline(
    device=0 if torch.cuda.is_available() else -1,
long_text = "Here is a lot of text I don't want to read. Replace me"

result = summarizer(long_text)

Training hyperparameters

NOTE: early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of 1024 characters. This was subsequently caught and adjusted to 1024 tokens and then trained further for 10+ epochs.

The following hyperparameters were used during the most recent training round*:

  • learning_rate: 0.0005
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 128
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.01
  • num_epochs: 2

* Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train

Framework versions

  • Transformers 4.20.1
  • Pytorch 1.10.0+cu113
  • Datasets 2.3.2
  • Tokenizers 0.12.1
Downloads last month

Evaluation results