File size: 13,941 Bytes

---
license:
- apache-2.0
- bsd-3-clause
tags:
- summarization
- summary
- booksum
- long-document
- long-form
- tglobal-xl
- XL
datasets:
- kmfoda/booksum
metrics:
- rouge
inference: false
model-index:
- name: pszemraj/long-t5-tglobal-xl-16384-book-summary
  results:
  - task:
      type: summarization
      name: Summarization
    dataset:
      name: multi_news
      type: multi_news
      config: default
      split: test
    metrics:
    - type: rouge
      value: 36.2043
      name: ROUGE-1
      verified: true
      verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYzRmMmUyOTVjMmJmZTRiZDcyYzY3MTQ1MmUyNDA5NjVhYzEzYzBiNzcxYTRhMDQ3OTlhMGZjYmJlNDM1M2NjYyIsInZlcnNpb24iOjF9._uArOQ1_0znXDPXMq7unA1OHB-XbgqzzKRbFRcVUzTUJdWk26LiSa2pEEVNNmJPg6Uo7CAvONmhpEswLvl9TAg
    - type: rouge
      value: 8.424
      name: ROUGE-2
      verified: true
      verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNzg0MzljYjVjYWQ3MmRkZDBlOGI5M2RiMGU0M2UwZGUzMDg2NTU0NjcwMTNiN2ZmODEzNTQ0MmEwNDA3NDA5MSIsInZlcnNpb24iOjF9.Dzj85ld6TjosQ8KyUdoadzicMLedEFrICC6Q-08O3qx28d9B9Uke1zw-VWabiuesPEDTRGbWuBgPA5vxYWUZAw
    - type: rouge
      value: 17.3721
      name: ROUGE-L
      verified: true
      verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNDA3ZjZmODAwMTNlM2RlZmJlMDI5MGVkMGRkMTBjMTYzNDk5ZjFiNTY5MWE1MDUwNWI2MDE4ZDA2YWMwMmI2NCIsInZlcnNpb24iOjF9.MOV_nId0XAK1eMQssG5GN9DsitZaTrxl4jdCJnOg9EZ0-vAw227ln599YV5YfZ1OPJnWwek6rneqqyONiHn9AQ
    - type: rouge
      value: 32.3994
      name: ROUGE-LSUM
      verified: true
      verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZmY3MDMwOTZjNWI0YTk1MDgwMzJkYTFiN2U5YWU0Mzc0MWRiMzc1NzZlMDhjMWUwMmY2ODI2MjI5ODBkYWUxOSIsInZlcnNpb24iOjF9._BwGIZbcA4pUBkEAL0cW-JPPta0KSoGug4Z7vogHacUz-AEhIOI5ICUldZh0pt9OK67MpUSzpShJOu3rSt5YDQ
    - type: loss
      value: 2.0843334197998047
      name: loss
      verified: true
      verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOWFhMmE5ZjA3ODM4YmVjMDMyMjk5YjNlMjA1MGMzOWY0NTRlYzk1YjZiMzQxMDMxOTMwMjFkNTdmNjM1NDcyMyIsInZlcnNpb24iOjF9.3wbXV4CIIgnfXAnnRztdOR12PwsWsEfiglQQ09K-C1EgW4gai4x9l-wTE2OZ7CTWkuk_tr4tL_uqOCXLZRMtCQ
    - type: gen_len
      value: 248.3572
      name: gen_len
      verified: true
      verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMWZhOGMwMDJjNGU2MzA2YzI1OWU1ZDY5N2NjZmM1YTA5NDg1MzUwNmU1YTBhNjQyNWYwYzA3OGNmODFjMmE2NSIsInZlcnNpb24iOjF9.Rc9u89zCdbFnjsnmq65l_JvCtUwOX_ZWapKJpTZ-rC8HxcUVfi2Ash2QfvvvxHH_YWhwklxxdnNa0HCm46qLAA
  - task:
      type: summarization
      name: Summarization
    dataset:
      name: billsum
      type: billsum
      config: default
      split: test
    metrics:
    - name: ROUGE-1
      type: rouge
      value: 41.3645
      verified: true
    - name: ROUGE-2
      type: rouge
      value: 16.144
      verified: true
    - name: ROUGE-L
      type: rouge
      value: 24.2981
      verified: true
    - name: ROUGE-LSUM
      type: rouge
      value: 35.3234
      verified: true
    - name: loss
      type: loss
      value: 1.282260775566101
      verified: true
    - name: gen_len
      type: gen_len
      value: 291.8158
      verified: true
  - task:
      type: summarization
      name: Summarization
    dataset:
      name: ccdv/arxiv-summarization
      type: ccdv/arxiv-summarization
      config: document
      split: test
    metrics:
    - name: ROUGE-1
      type: rouge
      value: 36.3225
      verified: true
    - name: ROUGE-2
      type: rouge
      value: 9.3743
      verified: true
    - name: ROUGE-L
      type: rouge
      value: 19.8396
      verified: true
    - name: ROUGE-LSUM
      type: rouge
      value: 32.2532
      verified: true
    - name: loss
      type: loss
      value: 2.146871566772461
      verified: true
    - name: gen_len
      type: gen_len
      value: 186.2966
      verified: true
---

# long-t5-tglobal-xl + BookSum

<a href="https://colab.research.google.com/gist/pszemraj/c19e32baf876deb866c31cd46c86e893/long-t5-xl-accelerate-test.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

Summarize long text and get a SparkNotes-esque summary of arbitrary topics!

-   Generalizes reasonably well to academic & narrative text.
-   This is the XL checkpoint, which **from a human-evaluation perspective, [produces even better summaries](https://long-t5-xl-book-summary-examples.netlify.app/)**.

A simple example/use case with [the base model](https://huggingface.co/pszemraj/long-t5-tglobal-base-16384-book-summary) on ASR is [here](https://longt5-booksum-example.netlify.app/).

## Cheeky Proof-of-Concept

A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/navy-seal-copypasta):

> In this chapter, the monster explains how he intends to exact revenge on "the little b\*\*\*\*" who insulted him. He tells the kiddo that he is a highly trained and experienced killer who will use his arsenal of weapons--including his access to the internet--to exact justice on the little brat.

While a somewhat crude example, try running this copypasta through other summarization models to see the difference in comprehension (_despite it not even being a "long" text!_)

* * *

**Contents**

<!-- TOC -->

-   [Description](#description)
-   [How-To in Python](#how-to-in-python)
    -   [Beyond the basics](#beyond-the-basics)
-   [About](#about)
    -   [Intended uses & limitations](#intended-uses--limitations)
    -   [Training and evaluation data](#training-and-evaluation-data)
    -   [Eval results](#eval-results)
-   [FAQ](#faq)
    -   [How can I run inference with this on CPU?](#how-can-i-run-inference-with-this-on-cpu)
    -   [How to run inference over a very long (30k+ tokens) document in batches?](#how-to-run-inference-over-a-very-long-30k-tokens-document-in-batches)
    -   [How to fine-tune further?](#how-to-fine-tune-further)
-   [Training procedure](#training-procedure)
    -   [Updates](#updates)
    -   [Training hyperparameters](#training-hyperparameters)
    -   [Framework versions](#framework-versions)

<!-- /TOC -->

* * *

## Description

A fine-tuned version of [google/long-t5-tglobal-xl](https://huggingface.co/google/long-t5-tglobal-xl) on the `kmfoda/booksum` dataset.

Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)

## How-To in Python

install/update transformers `pip install -U transformers`

summarize text with pipeline:

```python
import torch
from transformers import pipeline

summarizer = pipeline(
    "summarization",
    "pszemraj/long-t5-tglobal-xl-16384-book-summary",
    device=0 if torch.cuda.is_available() else -1,
)
long_text = "Here is a lot of text I don't want to read. Replace me"

result = summarizer(long_text)
print(result[0]["summary_text"])
```

### Beyond the basics

There are two additional points to consider beyond simple inference: adjusting decoding parameters for improved performance, and quantization for decreased memory devouring.

#### Adjusting parameters

Pass [other parameters related to beam search textgen](https://huggingface.co/blog/how-to-generate) when calling `summarizer` to get even higher quality results.

#### LLM.int8 Quantization

> alternate section title: how to get this monster to run inference on free Colab runtimes

Per [this PR](https://github.com/huggingface/transformers/pull/20341) LLM.int8 is now supported for `long-t5` models. Per **initial testing** summarization quality appears to hold while requiring _significantly_ less memory! \*

How-to: essentially ensure you have pip installed from the **latest GitHub repo main** version of `transformers`, and `bitsandbytes`

install the latest `main` branch:

```bash
pip install bitsandbytes
pip install git+https://github.com/huggingface/transformers.git
```

load in 8-bit (_voodoo magic-the good kind-completed by `bitsandbytes` behind the scenes_)

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained(
    "pszemraj/long-t5-tglobal-xl-16384-book-summary"
)

model = AutoModelForSeq2SeqLM.from_pretrained(
    "pszemraj/long-t5-tglobal-xl-16384-book-summary",
    load_in_8bit=True,
    device_map="auto",
)
```

The above is already present in the Colab demo linked at the top of the model card.

Do you love to ask questions? Awesome. But first, check out the [how LLM.int8 works blog post](https://huggingface.co/blog/hf-bitsandbytes-integration) by huggingface.

\* More rigorous metric-based investigation into comparing beam-search summarization with and without LLM.int8 will take place over time.

* * *

## About

### Intended uses & limitations

While this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.

Specifically: negation statements (i.e., model says: _This thing does not have [ATTRIBUTE]_ where instead it should have said _This thing has a lot of [ATTRIBUTE]_).

-   I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually fact-check this by comparing a specific claim to what the surrounding sentences imply.

### Training and evaluation data

`kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).

-   **Initial fine-tuning** only used input text with 12288 tokens input or less and 1024 tokens output or less (_i.e. rows with longer were dropped before training_) for memory reasons. Per brief analysis, summaries in the 12288-16384 range in this dataset are in the **small** minority
    -   In addition, this initial training combined the training and validation sets and trained on these in aggregate to increase the functional dataset size. **Therefore, take the validation set results with a grain of salt; primary metrics should be (always) the test set.**
-   **final phases of fine-tuning** used the standard conventions of 16384 input/1024 output keeping everything (truncating longer sequences). This did not appear to change the loss/performance much.

### Eval results

Official results with the [model evaluator](https://huggingface.co/spaces/autoevaluate/model-evaluator) will be computed and posted here.

**Please read the note above as due to training methods, validation set performance looks better than the test set results will be**. The model achieves the following results on the evaluation set:

-   eval_loss: 1.2756
-   eval_rouge1: 41.8013
-   eval_rouge2: 12.0895
-   eval_rougeL: 21.6007
-   eval_rougeLsum: 39.5382
-   eval_gen_len: 387.2945
-   eval_runtime: 13908.4995
-   eval_samples_per_second: 0.107
-   eval_steps_per_second: 0.027


    ***** predict/test metrics (initial) *****
      predict_gen_len            =   506.4368
      predict_loss               =      2.028
      predict_rouge1             =    36.8815
      predict_rouge2             =     8.0625
      predict_rougeL             =    17.6161
      predict_rougeLsum          =    34.9068
      predict_runtime            = 2:04:14.37
      predict_samples            =       1431
      predict_samples_per_second =      0.192
      predict_steps_per_second   =      0.048

\* evaluating big model not as easy as it seems. Doing a bit more investigating

* * *

## FAQ

### How can I run inference with this on CPU?

lol

### How to run inference over a very long (30k+ tokens) document in batches?

See `summarize.py` in [the code for my hf space Document Summarization](https://huggingface.co/spaces/pszemraj/document-summarization/blob/main/summarize.py) :)

You can also use the same code to split a document into batches of 4096, etc., and run over those with the model. This is useful in situations where CUDA memory is limited.

### How to fine-tune further?

See [train with a script](https://huggingface.co/docs/transformers/run_scripts) and [the summarization scripts](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization)

### Is there an easier way to use this?

I have created a python package utility for this reason. It's called [textsum](https://github.com/pszemraj/textsum), and you can use it to load models and summarize things in a few lines of code.

```sh
pip install textsum
```

Use `textsum` in python with this model:

```python
from textsum.summarize import Summarizer

summarizer = Summarizer(
    model_name_or_path="pszemraj/long-t5-tglobal-xl-16384-book-summary"
)

# summarize a long string
out_str = summarizer.summarize_string(
    "This is a long string of text that will be summarized."
)
print(f"summary: {out_str}")

```

This package provides easy-to-use interfaces for using summarization models on text documents of arbitrary length. Currently implemented interfaces include a python API, CLI, and a shareable demo app.

For details, explanations, and docs, see the README (_linked above_) or the [wiki](https://github.com/pszemraj/textsum/wiki).


* * *

## Training procedure

### Updates

Updates to this model/model card will be posted here as relevant. The model seems fairly converged; if updates/improvements are possible using the `BookSum` dataset, this repo will be updated.

### Training hyperparameters

The following hyperparameters were used during training:

-   learning_rate: 0.0006
-   train_batch_size: 1
-   eval_batch_size: 1
-   seed: 10350
-   distributed_type: multi-GPU
-   num_devices: 4
-   gradient_accumulation_steps: 32
-   total_train_batch_size: 128
-   total_eval_batch_size: 4
-   optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-   lr_scheduler_type: constant
-   num_epochs: 1.0

\*_Prior training sessions used roughly similar parameters (learning rates were higher); multiple sessions were required as this takes eons to train._

### Framework versions

-   Transformers 4.25.0.dev0
-   Pytorch 1.13.0+cu117
-   Datasets 2.6.1
-   Tokenizers 0.13.1

* * *