Shobhank-iiitdwd/long-t5-tglobal-base-16384-book-summary

generalizes reasonably well to academic & narrative text.

Contents

Model description
How-To in Python
Intended uses & limitations
Training and evaluation data
Inference over long documents in batches How to fine-tune further
Training procedure
Training hyperparameters
Framework versions

Model description

A fine-tuned version of google/long-t5-tglobal-base on the booksum dataset:

30+ epochs of fine-tuning from the base model on V100/A100 GPUs
Training used 16384 token input / 1024 max output

Read the paper by Guo et al. here: LongT5: Efficient Text-To-Text Transformer for Long Sequences

How-To in Python

Install/update transformers pip install -U transformers

Summarize text with pipeline:

import torch
from transformers import pipeline

summarizer = pipeline(
    "summarization",
    "Shobhank-iiitdwd/long-t5-tglobal-base-16384-book-summary",
    device=0 if torch.cuda.is_available() else -1,
)
long_text = "Here is a lot of text I don't want to read. Replace me"

result = summarizer(long_text)
print(result[0]["summary_text"])

Training hyperparameters

NOTE: early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of 1024 characters. This was subsequently caught and adjusted to 1024 tokens and then trained further for 10+ epochs.

The following hyperparameters were used during the most recent training round*:

learning_rate: 0.0005
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
gradient_accumulation_steps: 128
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.01
num_epochs: 2

* Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train

Framework versions

Transformers 4.20.1
Pytorch 1.10.0+cu113
Datasets 2.3.2
Tokenizers 0.12.1

Shobhank-iiitdwd
/

long-t5-tglobal-base-16384-book-summary

Model description

How-To in Python

Training hyperparameters

Framework versions

Evaluation results