pszemraj
/

long-t5-tglobal-xl-16384-book-summary

@@ -127,6 +127,10 @@ model-index:
 # long-t5-tglobal-xl + BookSum
 Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
 - Generalizes reasonably well to academic & narrative text.
 - This is the XL checkpoint, which **from a human-evaluation perspective, [produces even better summaries](https://long-t5-xl-book-summary-examples.netlify.app/)**.
@@ -151,11 +155,9 @@ Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer f
 ## How-To in Python
-> 🚧 `LLM.int8()` appears to be compatible with summarization and does not degrade the quality of the outputs; this is a crucial enabler for using this model on standard GPUs. A PR for this is in-progress [here](https://github.com/huggingface/transformers/pull/20341), and this model card will be updated with instructions once done :) 🚧
-Install/update transformers `pip install -U transformers`
-Summarize text with pipeline:
 ```python
 import torch
@@ -179,18 +181,33 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
 ### LLM.int8 Quantization
-Per a recent PR LLM.int8 is now supported for `long-t5` models. Per **initial testing** summarization quality appears to hold while requiring _significantly_ less memory! \*
-How-to: essentially ensure you have pip installed from the **latest GitHub repo main** version of `transformers`, and:
 ```python
 from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
-tokenizer = AutoTokenizer.from_pretrained("pszemraj/long-t5-tglobal-xl-16384-book-summary")
 model = AutoModelForSeq2SeqLM.from_pretrained(
-          "pszemraj/long-t5-tglobal-xl-16384-book-summary",
 )
 ```

 # long-t5-tglobal-xl + BookSum
+<a href="https://colab.research.google.com/gist/pszemraj/c19e32baf876deb866c31cd46c86e893/long-t5-xl-accelerate-test.ipynb">
+  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
+</a>
 Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
 - Generalizes reasonably well to academic & narrative text.
 - This is the XL checkpoint, which **from a human-evaluation perspective, [produces even better summaries](https://long-t5-xl-book-summary-examples.netlify.app/)**.
 ## How-To in Python
+install/update transformers `pip install -U transformers`
+summarize text with pipeline:
 ```python
 import torch
 ### LLM.int8 Quantization
+> alternate section title: how to get this monster to run inference on free Colab runtimes
+Per [this PR](https://github.com/huggingface/transformers/pull/20341) LLM.int8 is now supported for `long-t5` models. Per **initial testing** summarization quality appears to hold while requiring _significantly_ less memory! \*
+How-to: essentially ensure you have pip installed from the **latest GitHub repo main** version of `transformers`, and `bitsandbytes`
+install the latest `main` branch:
+```bash
+pip install bitsandbytes
+pip install git+https://github.com/huggingface/transformers.git
+```
+load in 8-bit (_voodoo magic-the good kind-completed by `bitsandbytes` behind the scenes_)
 ```python
 from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+tokenizer = AutoTokenizer.from_pretrained(
+    "pszemraj/long-t5-tglobal-xl-16384-book-summary"
+)
 model = AutoModelForSeq2SeqLM.from_pretrained(
+    "pszemraj/long-t5-tglobal-xl-16384-book-summary",
+    load_in_8bit=True,
+    device_map="auto",
 )
 ```