Edit model card

Longformer Encoder-Decoder (LED) for Narrative-Esque Long Text Summarization

Open In Colab

A fine-tuned version of allenai/led-large-16384 on the BookSum dataset.

Goal: a model that can generalize well and is useful in summarizing long text in academic and daily usage. The result works well on lots of text and can handle 16384 tokens/batch (if you have the GPU memory to handle that)

Note: the API is set to generate a max of 64 tokens for runtime reasons, so the summaries may be truncated (depending on the length of input text). For best results use python as below.


Usage - Basic

  • use encoder_no_repeat_ngram_size=3 when calling the pipeline object to improve summary quality.
    • this forces the model to use new vocabulary and create an abstractive summary, otherwise it may compile the best extractive summary from the input provided.

Load the model into a pipeline object:

import torch
from transformers import pipeline

hf_name = 'pszemraj/led-large-book-summary'

summarizer = pipeline(
    "summarization",
    hf_name,
    device=0 if torch.cuda.is_available() else -1,
)
  • put words into the pipeline object:
wall_of_text = "your words here"

result = summarizer(
    wall_of_text,
    min_length=16,
    max_length=256,
    no_repeat_ngram_size=3,
    encoder_no_repeat_ngram_size=3,
    repetition_penalty=3.5,
    num_beams=4,
    early_stopping=True,
)

Important: To generate the best quality summaries, you should use the global attention mask when decoding, as demonstrated in this community notebook here, see the definition of generate_answer(batch).

If having computing constraints, try the base version pszemraj/led-base-book-summary

  • all the parameters for generation on the API here are the same as the base model for easy comparison between versions.

Training and evaluation data

  • the booksum dataset (this is what adds the bsd-3-clause license)
  • During training, the input text was the text of the chapter, and the output was summary_text
  • Eval results can be found here with metrics on the sidebar.

Training procedure

  • Training completed on the BookSum dataset for 13 total epochs
  • The final four epochs combined the training and validation sets as 'train' in an effort to increase generalization.

Training hyperparameters

Initial Three Epochs

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3

In-between Epochs

Unfortunately, don't have all records on-hand for middle epochs; the following should be representative:

  • learning_rate: 4e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 6 (in addition to prior model)

Final Two Epochs

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 2 (in addition to prior model)

Framework versions

  • Transformers 4.19.2
  • Pytorch 1.11.0+cu113
  • Datasets 2.2.2
  • Tokenizers 0.12.1
Downloads last month
1,022
Hosted inference API
Summarization
Examples
Examples
This model can be loaded on the Inference API on-demand.

Dataset used to train pszemraj/led-large-book-summary

Spaces using pszemraj/led-large-book-summary 3

Evaluation results