pszemraj commited on
Commit
1217a66
1 Parent(s): 34f042c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -0
README.md CHANGED
@@ -171,9 +171,34 @@ long_text = "Here is a lot of text I don't want to read. Replace me"
171
  result = summarizer(long_text)
172
  print(result[0]["summary_text"])
173
  ```
 
 
 
174
 
175
  Pass [other parameters related to beam search textgen](https://huggingface.co/blog/how-to-generate) when calling `summarizer` to get even higher quality results.
176
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
177
  ---
178
 
179
  ## About
 
171
  result = summarizer(long_text)
172
  print(result[0]["summary_text"])
173
  ```
174
+ ### beyond the basics
175
+
176
+ ### decoding performance
177
 
178
  Pass [other parameters related to beam search textgen](https://huggingface.co/blog/how-to-generate) when calling `summarizer` to get even higher quality results.
179
 
180
+ ### LLM.int8 Quantization
181
+
182
+ Per a recent PR LLM.int8 is now supported for `long-t5` models. Per **initial testing** summarization quality appears to hold while requiring _significantly_ less memory! \*
183
+
184
+ How-to: essentially ensure you have pip installed from the **latest GitHub repo main** version of `transformers`, and:
185
+
186
+
187
+ ```python
188
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
189
+
190
+ tokenizer = AutoTokenizer.from_pretrained("pszemraj/long-t5-tglobal-xl-16384-book-summary")
191
+
192
+ model = AutoModelForSeq2SeqLM.from_pretrained(
193
+ "pszemraj/long-t5-tglobal-xl-16384-book-summary",
194
+ )
195
+ ```
196
+
197
+ Do you love to ask questions? Awesome. But first, check out the [how LLM.int8 works blog post](https://huggingface.co/blog/hf-bitsandbytes-integration) by huggingface.
198
+
199
+ \* More rigorous metric-based investigation into comparing beam-search summarization with and without LLM.int8 will take place over time.
200
+
201
+
202
  ---
203
 
204
  ## About