Longformer Encoder-Decoder (LED) for Narrative-Esque Long Text Summarization
A fine-tuned version of allenai/led-large-16384 on the BookSum
dataset.
Goal: a model that can generalize well and is useful in summarizing long text in academic and daily usage. The result works well on lots of text and can handle 16384 tokens/batch (if you have the GPU memory to handle that)
- See the Colab demo linked above or try the demo on Spaces
Note: the API is set to generate a max of 64 tokens for runtime reasons, so the summaries may be truncated (depending on the length of input text). For best results use python as below.
Usage - Basic
- use
encoder_no_repeat_ngram_size=3
when calling the pipeline object to improve summary quality.- this forces the model to use new vocabulary and create an abstractive summary, otherwise it may compile the best extractive summary from the input provided.
Load the model into a pipeline object:
import torch
from transformers import pipeline
hf_name = 'pszemraj/led-large-book-summary'
summarizer = pipeline(
"summarization",
hf_name,
device=0 if torch.cuda.is_available() else -1,
)
- put words into the pipeline object:
wall_of_text = "your words here"
result = summarizer(
wall_of_text,
min_length=16,
max_length=256,
no_repeat_ngram_size=3,
encoder_no_repeat_ngram_size=3,
repetition_penalty=3.5,
num_beams=4,
early_stopping=True,
)
Important: To generate the best quality summaries, you should use the global attention mask when decoding, as demonstrated in this community notebook here, see the definition of generate_answer(batch)
.
If having computing constraints, try the base version pszemraj/led-base-book-summary
- all the parameters for generation on the API here are the same as the base model for easy comparison between versions.
Training and evaluation data
- the booksum dataset (this is what adds the
bsd-3-clause
license) - During training, the input text was the text of the
chapter
, and the output wassummary_text
- Eval results can be found here with metrics on the sidebar.
Training procedure
- Training completed on the BookSum dataset for 13 total epochs
- The final four epochs combined the training and validation sets as 'train' in an effort to increase generalization.
Training hyperparameters
Initial Three Epochs
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3
In-between Epochs
Unfortunately, don't have all records on-hand for middle epochs; the following should be representative:
- learning_rate: 4e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 16
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 6 (in addition to prior model)
Final Two Epochs
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 16
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 2 (in addition to prior model)
Framework versions
- Transformers 4.19.2
- Pytorch 1.11.0+cu113
- Datasets 2.2.2
- Tokenizers 0.12.1
- Downloads last month
- 1,022
Dataset used to train pszemraj/led-large-book-summary
Spaces using pszemraj/led-large-book-summary 3
Evaluation results
- ROUGE-1 on kmfoda/booksumtest set verified31.731
- ROUGE-2 on kmfoda/booksumtest set verified5.331
- ROUGE-L on kmfoda/booksumtest set verified16.146
- ROUGE-LSUM on kmfoda/booksumtest set verified29.088
- loss on kmfoda/booksumtest set verified4.816
- gen_len on kmfoda/booksumtest set verified154.904
- ROUGE-1 on samsumtest set verified33.448
- ROUGE-2 on samsumtest set verified10.425
- ROUGE-L on samsumtest set verified24.580
- ROUGE-LSUM on samsumtest set verified29.823
- loss on samsumtest set verified4.176
- gen_len on samsumtest set verified65.400
- ROUGE-1 on billsumtest set verified40.584
- ROUGE-2 on billsumtest set verified17.340
- ROUGE-L on billsumtest set verified25.126
- ROUGE-LSUM on billsumtest set verified34.662
- loss on billsumtest set verified4.793
- gen_len on billsumtest set verified163.939
- ROUGE-1 on multi_newstest set verified39.083
- ROUGE-2 on multi_newstest set verified11.404
- ROUGE-L on multi_newstest set verified19.181
- ROUGE-LSUM on multi_newstest set verified35.158
- loss on multi_newstest set verified4.655
- gen_len on multi_newstest set verified186.249