Edit model card
  • generalizes reasonably well to academic & narrative text.

Contents


Model description

A fine-tuned version of google/long-t5-tglobal-base on the booksum dataset:

  • 30+ epochs of fine-tuning from the base model on V100/A100 GPUs
  • Training used 16384 token input / 1024 max output

Read the paper by Guo et al. here: LongT5: Efficient Text-To-Text Transformer for Long Sequences

How-To in Python

Install/update transformers pip install -U transformers

Summarize text with pipeline:

import torch
from transformers import pipeline

summarizer = pipeline(
    "summarization",
    "Shobhank-iiitdwd/long-t5-tglobal-base-16384-book-summary",
    device=0 if torch.cuda.is_available() else -1,
)
long_text = "Here is a lot of text I don't want to read. Replace me"

result = summarizer(long_text)
print(result[0]["summary_text"])

Training hyperparameters

NOTE: early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of 1024 characters. This was subsequently caught and adjusted to 1024 tokens and then trained further for 10+ epochs.

The following hyperparameters were used during the most recent training round*:

  • learning_rate: 0.0005
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • gradient_accumulation_steps: 128
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.01
  • num_epochs: 2

* Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train

Framework versions

  • Transformers 4.20.1
  • Pytorch 1.10.0+cu113
  • Datasets 2.3.2
  • Tokenizers 0.12.1
Downloads last month
7
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Evaluation results