pszemraj commited on
Commit
25f8f89
1 Parent(s): 1217a66

add 8bit deets

Browse files
Files changed (1) hide show
  1. README.md +25 -8
README.md CHANGED
@@ -127,6 +127,10 @@ model-index:
127
 
128
  # long-t5-tglobal-xl + BookSum
129
 
 
 
 
 
130
  Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
131
  - Generalizes reasonably well to academic & narrative text.
132
  - This is the XL checkpoint, which **from a human-evaluation perspective, [produces even better summaries](https://long-t5-xl-book-summary-examples.netlify.app/)**.
@@ -151,11 +155,9 @@ Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer f
151
 
152
  ## How-To in Python
153
 
154
- > 🚧 `LLM.int8()` appears to be compatible with summarization and does not degrade the quality of the outputs; this is a crucial enabler for using this model on standard GPUs. A PR for this is in-progress [here](https://github.com/huggingface/transformers/pull/20341), and this model card will be updated with instructions once done :) 🚧
155
-
156
- Install/update transformers `pip install -U transformers`
157
 
158
- Summarize text with pipeline:
159
 
160
  ```python
161
  import torch
@@ -179,18 +181,33 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
179
 
180
  ### LLM.int8 Quantization
181
 
182
- Per a recent PR LLM.int8 is now supported for `long-t5` models. Per **initial testing** summarization quality appears to hold while requiring _significantly_ less memory! \*
 
 
183
 
184
- How-to: essentially ensure you have pip installed from the **latest GitHub repo main** version of `transformers`, and:
185
 
186
 
 
 
 
 
 
 
 
 
 
187
  ```python
188
  from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
189
 
190
- tokenizer = AutoTokenizer.from_pretrained("pszemraj/long-t5-tglobal-xl-16384-book-summary")
 
 
191
 
192
  model = AutoModelForSeq2SeqLM.from_pretrained(
193
- "pszemraj/long-t5-tglobal-xl-16384-book-summary",
 
 
194
  )
195
  ```
196
 
 
127
 
128
  # long-t5-tglobal-xl + BookSum
129
 
130
+ <a href="https://colab.research.google.com/gist/pszemraj/c19e32baf876deb866c31cd46c86e893/long-t5-xl-accelerate-test.ipynb">
131
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
132
+ </a>
133
+
134
  Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
135
  - Generalizes reasonably well to academic & narrative text.
136
  - This is the XL checkpoint, which **from a human-evaluation perspective, [produces even better summaries](https://long-t5-xl-book-summary-examples.netlify.app/)**.
 
155
 
156
  ## How-To in Python
157
 
158
+ install/update transformers `pip install -U transformers`
 
 
159
 
160
+ summarize text with pipeline:
161
 
162
  ```python
163
  import torch
 
181
 
182
  ### LLM.int8 Quantization
183
 
184
+ > alternate section title: how to get this monster to run inference on free Colab runtimes
185
+
186
+ Per [this PR](https://github.com/huggingface/transformers/pull/20341) LLM.int8 is now supported for `long-t5` models. Per **initial testing** summarization quality appears to hold while requiring _significantly_ less memory! \*
187
 
188
+ How-to: essentially ensure you have pip installed from the **latest GitHub repo main** version of `transformers`, and `bitsandbytes`
189
 
190
 
191
+ install the latest `main` branch:
192
+
193
+ ```bash
194
+ pip install bitsandbytes
195
+ pip install git+https://github.com/huggingface/transformers.git
196
+ ```
197
+
198
+ load in 8-bit (_voodoo magic-the good kind-completed by `bitsandbytes` behind the scenes_)
199
+
200
  ```python
201
  from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
202
 
203
+ tokenizer = AutoTokenizer.from_pretrained(
204
+ "pszemraj/long-t5-tglobal-xl-16384-book-summary"
205
+ )
206
 
207
  model = AutoModelForSeq2SeqLM.from_pretrained(
208
+ "pszemraj/long-t5-tglobal-xl-16384-book-summary",
209
+ load_in_8bit=True,
210
+ device_map="auto",
211
  )
212
  ```
213