pszemraj
/

long-t5-tglobal-base-16384-book-summary

@@ -452,7 +452,6 @@ model-index:
       value: 214.9692
       verified: true
 ---
 # long-t5-tglobal-base-16384 + BookSum
  <a href="https://colab.research.google.com/gist/pszemraj/d9a0495861776168fd5cdcd7731bc4ee/example-long-t5-tglobal-base-16384-book-summary.ipynb">
@@ -461,9 +460,9 @@ model-index:
 Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
-- generalizes reasonably well to academic & narrative text.
-- A simple example/use case on ASR is [here](https://longt5-booksum-example.netlify.app/).
-- Example notebook in Colab (_click on the icon above_).
 ## Cheeky Proof-of-Concept
@@ -482,10 +481,11 @@ A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/
 - [Intended uses & limitations](#intended-uses--limitations)
 - [Training and evaluation data](#training-and-evaluation-data)
 - [FAQ](#faq)
-    - [Inference over long documents in batches](#how-to-run-inference-over-a-very-long-30k-tokens-document-in-batches)
-    - [How to fine-tune further](#how-to-fine-tune-further)
 - [Training procedure](#training-procedure)
-    - [Updates](#updates)
     - [Training hyperparameters](#training-hyperparameters)
     - [Framework versions](#framework-versions)
 - [Citation info](#citation-info)
@@ -498,8 +498,8 @@ A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/
 A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
-- 30+ epochs of fine-tuning from the base model on V100/A100 GPUs
-- Training used 16384 token input / 1024 max output
 Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)
@@ -528,15 +528,14 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
 ## Intended uses & limitations
-- The current checkpoint is fairly well converged but will be updated if further improvements can be made.
-    - Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset (API gen parameters are the same).
-- while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
 ## Training and evaluation data
 `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209). Summaries longer than 1024 LongT5 tokens were filtered out to prevent the model from learning to generate "partial" summaries.
 * * *
 ## FAQ
@@ -553,9 +552,9 @@ See [train with a script](https://huggingface.co/docs/transformers/run_scripts)
 This model was originally tuned on Google Colab with a heavily modified variant of the [longformer training notebook](https://github.com/patrickvonplaten/notebooks/blob/master/Fine_tune_Longformer_Encoder_Decoder_(LED)_for_Summarization_on_pubmed.ipynb), key enabler being deepspeed. You can try this as an alternate route to fine-tuning the model without using the command line.
-### Is there an easier way to use this?
-I have created a python package utility for this reason. It's called [textsum](https://github.com/pszemraj/textsum), and you can use it to load models and summarize things in a few lines of code.
 ```sh
 pip install textsum
@@ -570,17 +569,14 @@ summarizer = Summarizer(
     model_name_or_path="pszemraj/long-t5-tglobal-base-16384-book-summary"
 )
-# summarize a long string
-out_str = summarizer.summarize_string(
-    "This is a long string of text that will be summarized."
-)
 print(f"summary: {out_str}")
 ```
-This package provides easy-to-use interfaces for using summarization models on text documents of arbitrary length. Currently implemented interfaces include a python API, CLI, and a shareable demo app.
-For details, explanations, and docs, see the README (_linked above_) or the [wiki](https://github.com/pszemraj/textsum/wiki).
 * * *
@@ -588,8 +584,8 @@ For details, explanations, and docs, see the README (_linked above_) or the [wik
 ### Updates:
-- July 22, 2022: updated to a fairly converged checkpoint
-- July 3, 2022: Added a new version with several epochs of additional general training that is more performant.
 ### Training hyperparameters
@@ -597,26 +593,26 @@ _NOTE: early checkpoints of this model were trained on a "smaller" subsection of
 The following hyperparameters were used during the **most recent** training round\*:
-- learning_rate: 0.0005
-- train_batch_size: 1
-- eval_batch_size: 1
-- seed: 42
-- distributed_type: multi-GPU
-- gradient_accumulation_steps: 128
-- total_train_batch_size: 128
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.01
-- num_epochs: 2
 \* Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train
 ### Framework versions
-- Transformers 4.20.1
-- Pytorch 1.10.0+cu113
-- Datasets 2.3.2
-- Tokenizers 0.12.1
 ## Citation info

       value: 214.9692
       verified: true
 ---
 # long-t5-tglobal-base-16384 + BookSum
  <a href="https://colab.research.google.com/gist/pszemraj/d9a0495861776168fd5cdcd7731bc4ee/example-long-t5-tglobal-base-16384-book-summary.ipynb">
 Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
+-   generalizes reasonably well to academic & narrative text.
+-   A simple example/use case on ASR is [here](https://longt5-booksum-example.netlify.app/).
+-   Example notebook in Colab (_click on the icon above_).
 ## Cheeky Proof-of-Concept
 - [Intended uses & limitations](#intended-uses--limitations)
 - [Training and evaluation data](#training-and-evaluation-data)
 - [FAQ](#faq)
+    - [How to run inference over a very long (30k+ tokens) document in batches?](#how-to-run-inference-over-a-very-long-30k-tokens-document-in-batches)
+    - [How to fine-tune further?](#how-to-fine-tune-further)
+    - [Are there simpler ways to run this?](#are-there-simpler-ways-to-run-this)
 - [Training procedure](#training-procedure)
+    - [Updates:](#updates)
     - [Training hyperparameters](#training-hyperparameters)
     - [Framework versions](#framework-versions)
 - [Citation info](#citation-info)
 A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
+-   30+ epochs of fine-tuning from the base model on V100/A100 GPUs
+-   Training used 16384 token input / 1024 max output
 Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)
 ## Intended uses & limitations
+-   The current checkpoint is fairly well converged but will be updated if further improvements can be made.
+    -   Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset (API gen parameters are the same).
+-   while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
 ## Training and evaluation data
 `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209). Summaries longer than 1024 LongT5 tokens were filtered out to prevent the model from learning to generate "partial" summaries.
 * * *
 ## FAQ
 This model was originally tuned on Google Colab with a heavily modified variant of the [longformer training notebook](https://github.com/patrickvonplaten/notebooks/blob/master/Fine_tune_Longformer_Encoder_Decoder_(LED)_for_Summarization_on_pubmed.ipynb), key enabler being deepspeed. You can try this as an alternate route to fine-tuning the model without using the command line.
+### Are there simpler ways to run this?
+For this reason, I created a Python package utility. It's called [textsum](https://github.com/pszemraj/textsum), and you can use it to load models and summarize things in a few lines of code.
 ```sh
 pip install textsum
     model_name_or_path="pszemraj/long-t5-tglobal-base-16384-book-summary"
 )
+long_string = "This is a long string of text that will be summarized."
+out_str = summarizer.summarize_string(long_string)
 print(f"summary: {out_str}")
 ```
+This package provides easy-to-use interfaces for applying summarization models to text documents of arbitrary length. Currently implemented interfaces include a Python API, a CLI, and a shareable demo application.
+For details, explanations, and documentation, see the README (_linked above_) or the [wiki](https://github.com/pszemraj/textsum/wiki).
 * * *
 ### Updates:
+-   July 22, 2022: updated to a fairly converged checkpoint
+-   July 3, 2022: Added a new version with several epochs of additional general training that is more performant.
 ### Training hyperparameters
 The following hyperparameters were used during the **most recent** training round\*:
+-   learning_rate: 0.0005
+-   train_batch_size: 1
+-   eval_batch_size: 1
+-   seed: 42
+-   distributed_type: multi-GPU
+-   gradient_accumulation_steps: 128
+-   total_train_batch_size: 128
+-   optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+-   lr_scheduler_type: cosine
+-   lr_scheduler_warmup_ratio: 0.01
+-   num_epochs: 2
 \* Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train
 ### Framework versions
+-   Transformers 4.20.1
+-   Pytorch 1.10.0+cu113
+-   Datasets 2.3.2
+-   Tokenizers 0.12.1
 ## Citation info