pszemraj commited on
Commit
454168b
1 Parent(s): 1bb84f7

update README to be sexy

Browse files
Files changed (1) hide show
  1. README.md +83 -57
README.md CHANGED
@@ -132,8 +132,9 @@ model-index:
132
  </a>
133
 
134
  Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
135
- - Generalizes reasonably well to academic & narrative text.
136
- - This is the XL checkpoint, which **from a human-evaluation perspective, [produces even better summaries](https://long-t5-xl-book-summary-examples.netlify.app/)**.
 
137
 
138
  A simple example/use case with [the base model](https://huggingface.co/pszemraj/long-t5-tglobal-base-16384-book-summary) on ASR is [here](https://longt5-booksum-example.netlify.app/).
139
 
@@ -141,17 +142,41 @@ A simple example/use case with [the base model](https://huggingface.co/pszemraj/
141
 
142
  A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/navy-seal-copypasta):
143
 
144
- > In this chapter, the monster explains how he intends to exact revenge on "the little b****" who insulted him. He tells the kiddo that he is a highly trained and experienced killer who will use his arsenal of weapons--including his access to the internet--to exact justice on the little brat.
145
 
146
  While a somewhat crude example, try running this copypasta through other summarization models to see the difference in comprehension (_despite it not even being a "long" text!_)
147
 
148
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
149
 
150
  ## Description
151
 
152
  A fine-tuned version of [google/long-t5-tglobal-xl](https://huggingface.co/google/long-t5-tglobal-xl) on the `kmfoda/booksum` dataset.
153
 
154
- Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)
155
 
156
  ## How-To in Python
157
 
@@ -173,9 +198,10 @@ long_text = "Here is a lot of text I don't want to read. Replace me"
173
  result = summarizer(long_text)
174
  print(result[0]["summary_text"])
175
  ```
 
176
  ### Beyond the basics
177
 
178
- There are two additional points to consider beyond simple inference: adjusting decoding parameters for improved performance, and quantization for decreased memory devouring.
179
 
180
  #### Adjusting parameters
181
 
@@ -189,7 +215,6 @@ Per [this PR](https://github.com/huggingface/transformers/pull/20341) LLM.int8 i
189
 
190
  How-to: essentially ensure you have pip installed from the **latest GitHub repo main** version of `transformers`, and `bitsandbytes`
191
 
192
-
193
  install the latest `main` branch:
194
 
195
  ```bash
@@ -217,10 +242,9 @@ The above is already present in the Colab demo linked at the top of the model ca
217
 
218
  Do you love to ask questions? Awesome. But first, check out the [how LLM.int8 works blog post](https://huggingface.co/blog/hf-bitsandbytes-integration) by huggingface.
219
 
220
- \* More rigorous metric-based investigation into comparing beam-search summarization with and without LLM.int8 will take place over time.
221
-
222
 
223
- ---
224
 
225
  ## About
226
 
@@ -229,47 +253,49 @@ Do you love to ask questions? Awesome. But first, check out the [how LLM.int8 wo
229
  While this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
230
 
231
  Specifically: negation statements (i.e., model says: _This thing does not have [ATTRIBUTE]_ where instead it should have said _This thing has a lot of [ATTRIBUTE]_).
232
- - I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually fact-check this by comparing a specific claim to what the surrounding sentences imply.
 
233
 
234
  ### Training and evaluation data
235
 
236
- `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).
237
 
238
- - **Initial fine-tuning** only used input text with 12288 tokens input or less and 1024 tokens output or less (_i.e. rows with longer were dropped before training_) for memory reasons. Per brief analysis, summaries in the 12288-16384 range in this dataset are in the **small** minority
239
- - In addition, this initial training combined the training and validation sets and trained on these in aggregate to increase the functional dataset size. **Therefore, take the validation set results with a grain of salt; primary metrics should be (always) the test set.**
240
- - **final phases of fine-tuning** used the standard conventions of 16384 input/1024 output keeping everything (truncating longer sequences). This did not appear to change the loss/performance much.
241
 
242
  ### Eval results
243
 
244
  Official results with the [model evaluator](https://huggingface.co/spaces/autoevaluate/model-evaluator) will be computed and posted here.
245
 
246
  **Please read the note above as due to training methods, validation set performance looks better than the test set results will be**. The model achieves the following results on the evaluation set:
247
- - eval_loss: 1.2756
248
- - eval_rouge1: 41.8013
249
- - eval_rouge2: 12.0895
250
- - eval_rougeL: 21.6007
251
- - eval_rougeLsum: 39.5382
252
- - eval_gen_len: 387.2945
253
- - eval_runtime: 13908.4995
254
- - eval_samples_per_second: 0.107
255
- - eval_steps_per_second: 0.027
256
 
257
- ```
258
- ***** predict/test metrics (initial) *****
259
- predict_gen_len = 506.4368
260
- predict_loss = 2.028
261
- predict_rouge1 = 36.8815
262
- predict_rouge2 = 8.0625
263
- predict_rougeL = 17.6161
264
- predict_rougeLsum = 34.9068
265
- predict_runtime = 2:04:14.37
266
- predict_samples = 1431
267
- predict_samples_per_second = 0.192
268
- predict_steps_per_second = 0.048
269
- ```
 
 
 
 
 
 
 
 
 
 
270
  \* evaluating big model not as easy as it seems. Doing a bit more investigating
271
 
272
- ---
273
 
274
  ## FAQ
275
 
@@ -287,8 +313,7 @@ You can also use the same code to split a document into batches of 4096, etc., a
287
 
288
  See [train with a script](https://huggingface.co/docs/transformers/run_scripts) and [the summarization scripts](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization)
289
 
290
-
291
- ---
292
 
293
  ## Training procedure
294
 
@@ -299,26 +324,27 @@ Updates to this model/model card will be posted here as relevant. The model seem
299
  ### Training hyperparameters
300
 
301
  The following hyperparameters were used during training:
302
- - learning_rate: 0.0006
303
- - train_batch_size: 1
304
- - eval_batch_size: 1
305
- - seed: 10350
306
- - distributed_type: multi-GPU
307
- - num_devices: 4
308
- - gradient_accumulation_steps: 32
309
- - total_train_batch_size: 128
310
- - total_eval_batch_size: 4
311
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
312
- - lr_scheduler_type: constant
313
- - num_epochs: 1.0
 
314
 
315
  \*_Prior training sessions used roughly similar parameters (learning rates were higher); multiple sessions were required as this takes eons to train._
316
 
317
  ### Framework versions
318
 
319
- - Transformers 4.25.0.dev0
320
- - Pytorch 1.13.0+cu117
321
- - Datasets 2.6.1
322
- - Tokenizers 0.13.1
323
 
324
- ---
 
132
  </a>
133
 
134
  Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
135
+
136
+ - Generalizes reasonably well to academic & narrative text.
137
+ - This is the XL checkpoint, which **from a human-evaluation perspective, [produces even better summaries](https://long-t5-xl-book-summary-examples.netlify.app/)**.
138
 
139
  A simple example/use case with [the base model](https://huggingface.co/pszemraj/long-t5-tglobal-base-16384-book-summary) on ASR is [here](https://longt5-booksum-example.netlify.app/).
140
 
 
142
 
143
  A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/navy-seal-copypasta):
144
 
145
+ > In this chapter, the monster explains how he intends to exact revenge on "the little b\*\*\*\*" who insulted him. He tells the kiddo that he is a highly trained and experienced killer who will use his arsenal of weapons--including his access to the internet--to exact justice on the little brat.
146
 
147
  While a somewhat crude example, try running this copypasta through other summarization models to see the difference in comprehension (_despite it not even being a "long" text!_)
148
 
149
+ * * *
150
+
151
+ **Contents**
152
+
153
+ <!-- TOC -->
154
+
155
+ - [Description](#description)
156
+ - [How-To in Python](#how-to-in-python)
157
+ - [Beyond the basics](#beyond-the-basics)
158
+ - [About](#about)
159
+ - [Intended uses & limitations](#intended-uses--limitations)
160
+ - [Training and evaluation data](#training-and-evaluation-data)
161
+ - [Eval results](#eval-results)
162
+ - [FAQ](#faq)
163
+ - [How can I run inference with this on CPU?](#how-can-i-run-inference-with-this-on-cpu)
164
+ - [How to run inference over a very long (30k+ tokens) document in batches?](#how-to-run-inference-over-a-very-long-30k-tokens-document-in-batches)
165
+ - [How to fine-tune further?](#how-to-fine-tune-further)
166
+ - [Training procedure](#training-procedure)
167
+ - [Updates](#updates)
168
+ - [Training hyperparameters](#training-hyperparameters)
169
+ - [Framework versions](#framework-versions)
170
+
171
+ <!-- /TOC -->
172
+
173
+ * * *
174
 
175
  ## Description
176
 
177
  A fine-tuned version of [google/long-t5-tglobal-xl](https://huggingface.co/google/long-t5-tglobal-xl) on the `kmfoda/booksum` dataset.
178
 
179
+ Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)
180
 
181
  ## How-To in Python
182
 
 
198
  result = summarizer(long_text)
199
  print(result[0]["summary_text"])
200
  ```
201
+
202
  ### Beyond the basics
203
 
204
+ There are two additional points to consider beyond simple inference: adjusting decoding parameters for improved performance, and quantization for decreased memory devouring.
205
 
206
  #### Adjusting parameters
207
 
 
215
 
216
  How-to: essentially ensure you have pip installed from the **latest GitHub repo main** version of `transformers`, and `bitsandbytes`
217
 
 
218
  install the latest `main` branch:
219
 
220
  ```bash
 
242
 
243
  Do you love to ask questions? Awesome. But first, check out the [how LLM.int8 works blog post](https://huggingface.co/blog/hf-bitsandbytes-integration) by huggingface.
244
 
245
+ \* More rigorous metric-based investigation into comparing beam-search summarization with and without LLM.int8 will take place over time.
 
246
 
247
+ * * *
248
 
249
  ## About
250
 
 
253
  While this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
254
 
255
  Specifically: negation statements (i.e., model says: _This thing does not have [ATTRIBUTE]_ where instead it should have said _This thing has a lot of [ATTRIBUTE]_).
256
+
257
+ - I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually fact-check this by comparing a specific claim to what the surrounding sentences imply.
258
 
259
  ### Training and evaluation data
260
 
261
+ `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).
262
 
263
+ - **Initial fine-tuning** only used input text with 12288 tokens input or less and 1024 tokens output or less (_i.e. rows with longer were dropped before training_) for memory reasons. Per brief analysis, summaries in the 12288-16384 range in this dataset are in the **small** minority
264
+ - In addition, this initial training combined the training and validation sets and trained on these in aggregate to increase the functional dataset size. **Therefore, take the validation set results with a grain of salt; primary metrics should be (always) the test set.**
265
+ - **final phases of fine-tuning** used the standard conventions of 16384 input/1024 output keeping everything (truncating longer sequences). This did not appear to change the loss/performance much.
266
 
267
  ### Eval results
268
 
269
  Official results with the [model evaluator](https://huggingface.co/spaces/autoevaluate/model-evaluator) will be computed and posted here.
270
 
271
  **Please read the note above as due to training methods, validation set performance looks better than the test set results will be**. The model achieves the following results on the evaluation set:
 
 
 
 
 
 
 
 
 
272
 
273
+ - eval_loss: 1.2756
274
+ - eval_rouge1: 41.8013
275
+ - eval_rouge2: 12.0895
276
+ - eval_rougeL: 21.6007
277
+ - eval_rougeLsum: 39.5382
278
+ - eval_gen_len: 387.2945
279
+ - eval_runtime: 13908.4995
280
+ - eval_samples_per_second: 0.107
281
+ - eval_steps_per_second: 0.027
282
+
283
+
284
+ ***** predict/test metrics (initial) *****
285
+ predict_gen_len = 506.4368
286
+ predict_loss = 2.028
287
+ predict_rouge1 = 36.8815
288
+ predict_rouge2 = 8.0625
289
+ predict_rougeL = 17.6161
290
+ predict_rougeLsum = 34.9068
291
+ predict_runtime = 2:04:14.37
292
+ predict_samples = 1431
293
+ predict_samples_per_second = 0.192
294
+ predict_steps_per_second = 0.048
295
+
296
  \* evaluating big model not as easy as it seems. Doing a bit more investigating
297
 
298
+ * * *
299
 
300
  ## FAQ
301
 
 
313
 
314
  See [train with a script](https://huggingface.co/docs/transformers/run_scripts) and [the summarization scripts](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization)
315
 
316
+ * * *
 
317
 
318
  ## Training procedure
319
 
 
324
  ### Training hyperparameters
325
 
326
  The following hyperparameters were used during training:
327
+
328
+ - learning_rate: 0.0006
329
+ - train_batch_size: 1
330
+ - eval_batch_size: 1
331
+ - seed: 10350
332
+ - distributed_type: multi-GPU
333
+ - num_devices: 4
334
+ - gradient_accumulation_steps: 32
335
+ - total_train_batch_size: 128
336
+ - total_eval_batch_size: 4
337
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
338
+ - lr_scheduler_type: constant
339
+ - num_epochs: 1.0
340
 
341
  \*_Prior training sessions used roughly similar parameters (learning rates were higher); multiple sessions were required as this takes eons to train._
342
 
343
  ### Framework versions
344
 
345
+ - Transformers 4.25.0.dev0
346
+ - Pytorch 1.13.0+cu117
347
+ - Datasets 2.6.1
348
+ - Tokenizers 0.13.1
349
 
350
+ * * *