Edit model card

Longformer Encoder-Decoder (LED) fine-tuned on Booksum

Note: the API is set to generate a max of 64 tokens for runtime reasons, so the summaries may be truncated (depending on length of input text). For best results use python as below.


Usage - Basics

  • it is recommended to use encoder_no_repeat_ngram_size=3 when calling the pipeline object to improve summary quality.
    • this param forces the model to use new vocabulary and create an abstractive summary, otherwise it may compile the best extractive summary from the input provided.
  • create the pipeline object:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
from transformers import pipeline

hf_name = 'pszemraj/led-large-book-summary'

_model = AutoModelForSeq2SeqLM.from_pretrained(
                hf_name,
                low_cpu_mem_usage=True,
            )

_tokenizer = AutoTokenizer.from_pretrained(
                hf_name
            )
                                           

summarizer = pipeline(
                    "summarization", 
                    model=_model, 
                    tokenizer=_tokenizer
                )
  • put words into the pipeline object:
wall_of_text = "your words here"

result = summarizer(
           wall_of_text,
           min_length=16, 
           max_length=256,
           no_repeat_ngram_size=3, 
           encoder_no_repeat_ngram_size =3,
           clean_up_tokenization_spaces=True,
           repetition_penalty=3.7,
           num_beams=4,
           early_stopping=True,
    )

  • Important: To generate the best quality summaries, you should use the global attention mask when decoding, as demonstrated in this community notebook here, see the definition of generate_answer(batch).
  • If you run into compute constraints, try the base version pszemraj/led-base-book-summary

Training and evaluation data

  • the booksum dataset
  • During training, the input text was the text of the chapter, and the output was summary_text
  • Eval results can be found here with metrics on the sidebar.

Training procedure

  • Training completed on the BookSum dataset for 13 total epochs
  • The final four epochs combined the training and validation sets as 'train' in an effort to increase generalization.

Training hyperparameters

Initial Three Epochs

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3

In-between Epochs

Unfortunately, don't have all records on-hand for middle epochs, the following should be representative:

  • learning_rate: 4e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 6 (in addition to prior model)

Final Two Epochs

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 2 (in addition to prior model)

Framework versions

  • Transformers 4.19.2
  • Pytorch 1.11.0+cu113
  • Datasets 2.2.2
  • Tokenizers 0.12.1
Downloads last month
835
Hosted inference API
Summarization
Examples
Examples
This model can be loaded on the Inference API on-demand.

Dataset used to train pszemraj/led-large-book-summary

Space using pszemraj/led-large-book-summary

Evaluation results