- generalizes reasonably well to academic & narrative text.
Contents
- Model description
- How-To in Python
- Intended uses & limitations
- Training and evaluation data
- Inference over long documents in batches How to fine-tune further
- Training procedure
- Training hyperparameters
- Framework versions
Model description
A fine-tuned version of google/long-t5-tglobal-base on the booksum
dataset:
- 30+ epochs of fine-tuning from the base model on V100/A100 GPUs
- Training used 16384 token input / 1024 max output
Read the paper by Guo et al. here: LongT5: Efficient Text-To-Text Transformer for Long Sequences
How-To in Python
Install/update transformers pip install -U transformers
Summarize text with pipeline:
import torch
from transformers import pipeline
summarizer = pipeline(
"summarization",
"Shobhank-iiitdwd/long-t5-tglobal-base-16384-book-summary",
device=0 if torch.cuda.is_available() else -1,
)
long_text = "Here is a lot of text I don't want to read. Replace me"
result = summarizer(long_text)
print(result[0]["summary_text"])
Training hyperparameters
NOTE: early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of 1024 characters. This was subsequently caught and adjusted to 1024 tokens and then trained further for 10+ epochs.
The following hyperparameters were used during the most recent training round*:
- learning_rate: 0.0005
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 128
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.01
- num_epochs: 2
* Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train
Framework versions
- Transformers 4.20.1
- Pytorch 1.10.0+cu113
- Datasets 2.3.2
- Tokenizers 0.12.1
- Downloads last month
- 6
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Evaluation results
- ROUGE-1 on kmfoda/booksumtest set self-reported36.408
- ROUGE-2 on kmfoda/booksumtest set self-reported6.065
- ROUGE-L on kmfoda/booksumtest set self-reported16.721
- ROUGE-LSUM on kmfoda/booksumtest set self-reported33.340
- loss on kmfoda/booksumtest set self-reportedNaN
- gen_len on kmfoda/booksumtest set self-reported252.810
- ROUGE-1 on samsumtest set self-reported30.905
- ROUGE-2 on samsumtest set self-reported7.471
- ROUGE-L on samsumtest set self-reported22.396
- ROUGE-LSUM on samsumtest set self-reported26.909