pszemraj
/

long-t5-tglobal-base-16384-book-summary

@@ -457,27 +457,49 @@ model-index:
   <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
 </a>
-- summarize long text and get a SparkNotes-esque summary of arbitrary topics!
-- generalizes reasonably well to academic & narrative text.
-- A simple example/use case on ASR is [here](https://longt5-booksum-example.netlify.app/). There's also an example notebook in Colab (click on the icon above).
 ## Cheeky Proof-of-Concept
 A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/navy-seal-copypasta):
 > The narrator tells us that he's graduated from the Navy seals and has been involved in many secret raids. He's also one of the best snipers in the entire U.S. military. He promises to "wipe you out with precision" when they meet again.
----
 ## Model description
 A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
 - 30+ epochs of fine-tuning from the base model on V100/A100 GPUs
-- all training used 16384 token input / 1024 max output
-Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)
 ## How-To in Python
@@ -505,18 +527,15 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
 ## Intended uses & limitations
 - The current checkpoint is fairly well converged but will be updated if further improvements can be made.
-  - Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset (API gen parameters are the same).
 - while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
 ## Training and evaluation data
 `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209). Summaries longer than 1024 LongT5 tokens were filtered out to prevent the model from learning to generate "partial" summaries.
-_NOTE: early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens** and then trained further for 10+ epochs._
----
----
 ## FAQ
@@ -530,22 +549,21 @@ You can also use the same code to split a document into batches of 4096, etc., a
 See [train with a script](https://huggingface.co/docs/transformers/run_scripts) and [the summarization scripts](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization).
-This model was originally tuned on Google Colab with a heavily modified variant of the [longformer training notebook](https://github.com/patrickvonplaten/notebooks/blob/master/Fine_tune_Longformer_Encoder_Decoder_(LED)_for_Summarization_on_pubmed.ipynb), key enabler being deepspeed. You can try this as an alternate route to fine-tuning the model without the command line.
----
 ## Training procedure
 ### Updates:
 - July 22, 2022: updated to a fairly converged checkpoint
-- July 3, 2022: Added a new version with several epochs of additional training that is more performant in general.
 ### Training hyperparameters
 The following hyperparameters were used during the **most recent** training round\*:
 - learning_rate: 0.0005
@@ -560,9 +578,7 @@ The following hyperparameters were used during the **most recent** training roun
 - lr_scheduler_warmup_ratio: 0.01
 - num_epochs: 2
-\*_Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train
 ### Framework versions
@@ -571,18 +587,15 @@ The following hyperparameters were used during the **most recent** training roun
 - Datasets 2.3.2
 - Tokenizers 0.12.1
-## citation info
 If you find `pszemraj/long-t5-tglobal-base-16384-book-summary` useful in your work, please consider citing this model :)
-```
-@misc {peter_szemraj_2022,
-	author       = { {Peter Szemraj} },
-	title        = { long-t5-tglobal-base-16384-book-summary (Revision 4b12bce) },
-	year         = 2022,
-	url          = { https://huggingface.co/pszemraj/long-t5-tglobal-base-16384-book-summary },
-	doi          = { 10.57967/hf/0100 },
-	publisher    = { Hugging Face }
-}
-```

   <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
 </a>
+Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
+- generalizes reasonably well to academic & narrative text.
+- A simple example/use case on ASR is [here](https://longt5-booksum-example.netlify.app/).
+- Example notebook in Colab (_click on the icon above_).
 ## Cheeky Proof-of-Concept
 A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/navy-seal-copypasta):
 > The narrator tells us that he's graduated from the Navy seals and has been involved in many secret raids. He's also one of the best snipers in the entire U.S. military. He promises to "wipe you out with precision" when they meet again.
+* * *
+**Contents**
+<!-- TOC -->
+- [Model description](#model-description)
+- [How-To in Python](#how-to-in-python)
+- [Intended uses & limitations](#intended-uses--limitations)
+- [Training and evaluation data](#training-and-evaluation-data)
+- [FAQ](#faq)
+    - [Inference over long documents in batches](#how-to-run-inference-over-a-very-long-30k-tokens-document-in-batches)
+    - [How to fine-tune further](#how-to-fine-tune-further)
+- [Training procedure](#training-procedure)
+    - [Updates](#updates)
+    - [Training hyperparameters](#training-hyperparameters)
+    - [Framework versions](#framework-versions)
+- [Citation info](#citation-info)
+<!-- /TOC -->
+* * *
 ## Model description
 A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
 - 30+ epochs of fine-tuning from the base model on V100/A100 GPUs
+- Training used 16384 token input / 1024 max output
+Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)
 ## How-To in Python
 ## Intended uses & limitations
 - The current checkpoint is fairly well converged but will be updated if further improvements can be made.
+    - Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset (API gen parameters are the same).
 - while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
 ## Training and evaluation data
 `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209). Summaries longer than 1024 LongT5 tokens were filtered out to prevent the model from learning to generate "partial" summaries.
+* * *
 ## FAQ
 See [train with a script](https://huggingface.co/docs/transformers/run_scripts) and [the summarization scripts](https://github.com/huggingface/transformers/tree/main/examples/pytorch/summarization).
+This model was originally tuned on Google Colab with a heavily modified variant of the [longformer training notebook](https://github.com/patrickvonplaten/notebooks/blob/master/Fine_tune_Longformer_Encoder_Decoder_(LED)_for_Summarization_on_pubmed.ipynb), key enabler being deepspeed. You can try this as an alternate route to fine-tuning the model without using the command line.
+* * *
 ## Training procedure
 ### Updates:
 - July 22, 2022: updated to a fairly converged checkpoint
+- July 3, 2022: Added a new version with several epochs of additional general training that is more performant.
 ### Training hyperparameters
+_NOTE: early checkpoints of this model were trained on a "smaller" subsection of the dataset as it was filtered for summaries of **1024 characters**. This was subsequently caught and adjusted to **1024 tokens** and then trained further for 10+ epochs._
 The following hyperparameters were used during the **most recent** training round\*:
 - learning_rate: 0.0005
 - lr_scheduler_warmup_ratio: 0.01
 - num_epochs: 2
+\* Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train
 ### Framework versions
 - Datasets 2.3.2
 - Tokenizers 0.12.1
+## Citation info
 If you find `pszemraj/long-t5-tglobal-base-16384-book-summary` useful in your work, please consider citing this model :)
+    @misc {peter_szemraj_2022,
+    	author       = { {Peter Szemraj} },
+    	title        = { long-t5-tglobal-base-16384-book-summary (Revision 4b12bce) },
+    	year         = 2022,
+    	url          = { https://huggingface.co/pszemraj/long-t5-tglobal-base-16384-book-summary },
+    	doi          = { 10.57967/hf/0100 },
+    	publisher    = { Hugging Face }
+    }