pszemraj commited on
Commit
3da8a5b
1 Parent(s): 56a781b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -39
README.md CHANGED
@@ -452,7 +452,6 @@ model-index:
452
  value: 214.9692
453
  verified: true
454
  ---
455
-
456
  # long-t5-tglobal-base-16384 + BookSum
457
 
458
  <a href="https://colab.research.google.com/gist/pszemraj/d9a0495861776168fd5cdcd7731bc4ee/example-long-t5-tglobal-base-16384-book-summary.ipynb">
@@ -461,9 +460,9 @@ model-index:
461
 
462
  Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
463
 
464
- - generalizes reasonably well to academic & narrative text.
465
- - A simple example/use case on ASR is [here](https://longt5-booksum-example.netlify.app/).
466
- - Example notebook in Colab (_click on the icon above_).
467
 
468
  ## Cheeky Proof-of-Concept
469
 
@@ -482,10 +481,11 @@ A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/
482
  - [Intended uses & limitations](#intended-uses--limitations)
483
  - [Training and evaluation data](#training-and-evaluation-data)
484
  - [FAQ](#faq)
485
- - [Inference over long documents in batches](#how-to-run-inference-over-a-very-long-30k-tokens-document-in-batches)
486
- - [How to fine-tune further](#how-to-fine-tune-further)
 
487
  - [Training procedure](#training-procedure)
488
- - [Updates](#updates)
489
  - [Training hyperparameters](#training-hyperparameters)
490
  - [Framework versions](#framework-versions)
491
  - [Citation info](#citation-info)
@@ -498,8 +498,8 @@ A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/
498
 
499
  A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
500
 
501
- - 30+ epochs of fine-tuning from the base model on V100/A100 GPUs
502
- - Training used 16384 token input / 1024 max output
503
 
504
  Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)
505
 
@@ -528,15 +528,14 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
528
 
529
  ## Intended uses & limitations
530
 
531
- - The current checkpoint is fairly well converged but will be updated if further improvements can be made.
532
- - Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset (API gen parameters are the same).
533
- - while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
534
 
535
  ## Training and evaluation data
536
 
537
  `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209). Summaries longer than 1024 LongT5 tokens were filtered out to prevent the model from learning to generate "partial" summaries.
538
 
539
-
540
  * * *
541
 
542
  ## FAQ
@@ -553,9 +552,9 @@ See [train with a script](https://huggingface.co/docs/transformers/run_scripts)
553
 
554
  This model was originally tuned on Google Colab with a heavily modified variant of the [longformer training notebook](https://github.com/patrickvonplaten/notebooks/blob/master/Fine_tune_Longformer_Encoder_Decoder_(LED)_for_Summarization_on_pubmed.ipynb), key enabler being deepspeed. You can try this as an alternate route to fine-tuning the model without using the command line.
555
 
556
- ### Is there an easier way to use this?
557
 
558
- I have created a python package utility for this reason. It's called [textsum](https://github.com/pszemraj/textsum), and you can use it to load models and summarize things in a few lines of code.
559
 
560
  ```sh
561
  pip install textsum
@@ -570,17 +569,14 @@ summarizer = Summarizer(
570
  model_name_or_path="pszemraj/long-t5-tglobal-base-16384-book-summary"
571
  )
572
 
573
- # summarize a long string
574
- out_str = summarizer.summarize_string(
575
- "This is a long string of text that will be summarized."
576
- )
577
  print(f"summary: {out_str}")
578
-
579
  ```
580
 
581
- This package provides easy-to-use interfaces for using summarization models on text documents of arbitrary length. Currently implemented interfaces include a python API, CLI, and a shareable demo app.
582
 
583
- For details, explanations, and docs, see the README (_linked above_) or the [wiki](https://github.com/pszemraj/textsum/wiki).
584
 
585
  * * *
586
 
@@ -588,8 +584,8 @@ For details, explanations, and docs, see the README (_linked above_) or the [wik
588
 
589
  ### Updates:
590
 
591
- - July 22, 2022: updated to a fairly converged checkpoint
592
- - July 3, 2022: Added a new version with several epochs of additional general training that is more performant.
593
 
594
  ### Training hyperparameters
595
 
@@ -597,26 +593,26 @@ _NOTE: early checkpoints of this model were trained on a "smaller" subsection of
597
 
598
  The following hyperparameters were used during the **most recent** training round\*:
599
 
600
- - learning_rate: 0.0005
601
- - train_batch_size: 1
602
- - eval_batch_size: 1
603
- - seed: 42
604
- - distributed_type: multi-GPU
605
- - gradient_accumulation_steps: 128
606
- - total_train_batch_size: 128
607
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
608
- - lr_scheduler_type: cosine
609
- - lr_scheduler_warmup_ratio: 0.01
610
- - num_epochs: 2
611
 
612
  \* Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train
613
 
614
  ### Framework versions
615
 
616
- - Transformers 4.20.1
617
- - Pytorch 1.10.0+cu113
618
- - Datasets 2.3.2
619
- - Tokenizers 0.12.1
620
 
621
  ## Citation info
622
 
 
452
  value: 214.9692
453
  verified: true
454
  ---
 
455
  # long-t5-tglobal-base-16384 + BookSum
456
 
457
  <a href="https://colab.research.google.com/gist/pszemraj/d9a0495861776168fd5cdcd7731bc4ee/example-long-t5-tglobal-base-16384-book-summary.ipynb">
 
460
 
461
  Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
462
 
463
+ - generalizes reasonably well to academic & narrative text.
464
+ - A simple example/use case on ASR is [here](https://longt5-booksum-example.netlify.app/).
465
+ - Example notebook in Colab (_click on the icon above_).
466
 
467
  ## Cheeky Proof-of-Concept
468
 
 
481
  - [Intended uses & limitations](#intended-uses--limitations)
482
  - [Training and evaluation data](#training-and-evaluation-data)
483
  - [FAQ](#faq)
484
+ - [How to run inference over a very long (30k+ tokens) document in batches?](#how-to-run-inference-over-a-very-long-30k-tokens-document-in-batches)
485
+ - [How to fine-tune further?](#how-to-fine-tune-further)
486
+ - [Are there simpler ways to run this?](#are-there-simpler-ways-to-run-this)
487
  - [Training procedure](#training-procedure)
488
+ - [Updates:](#updates)
489
  - [Training hyperparameters](#training-hyperparameters)
490
  - [Framework versions](#framework-versions)
491
  - [Citation info](#citation-info)
 
498
 
499
  A fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the `kmfoda/booksum` dataset:
500
 
501
+ - 30+ epochs of fine-tuning from the base model on V100/A100 GPUs
502
+ - Training used 16384 token input / 1024 max output
503
 
504
  Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)
505
 
 
528
 
529
  ## Intended uses & limitations
530
 
531
+ - The current checkpoint is fairly well converged but will be updated if further improvements can be made.
532
+ - Compare performance to [LED-base](https://huggingface.co/pszemraj/led-base-book-summary) trained on the same dataset (API gen parameters are the same).
533
+ - while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
534
 
535
  ## Training and evaluation data
536
 
537
  `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209). Summaries longer than 1024 LongT5 tokens were filtered out to prevent the model from learning to generate "partial" summaries.
538
 
 
539
  * * *
540
 
541
  ## FAQ
 
552
 
553
  This model was originally tuned on Google Colab with a heavily modified variant of the [longformer training notebook](https://github.com/patrickvonplaten/notebooks/blob/master/Fine_tune_Longformer_Encoder_Decoder_(LED)_for_Summarization_on_pubmed.ipynb), key enabler being deepspeed. You can try this as an alternate route to fine-tuning the model without using the command line.
554
 
555
+ ### Are there simpler ways to run this?
556
 
557
+ For this reason, I created a Python package utility. It's called [textsum](https://github.com/pszemraj/textsum), and you can use it to load models and summarize things in a few lines of code.
558
 
559
  ```sh
560
  pip install textsum
 
569
  model_name_or_path="pszemraj/long-t5-tglobal-base-16384-book-summary"
570
  )
571
 
572
+ long_string = "This is a long string of text that will be summarized."
573
+ out_str = summarizer.summarize_string(long_string)
 
 
574
  print(f"summary: {out_str}")
 
575
  ```
576
 
577
+ This package provides easy-to-use interfaces for applying summarization models to text documents of arbitrary length. Currently implemented interfaces include a Python API, a CLI, and a shareable demo application.
578
 
579
+ For details, explanations, and documentation, see the README (_linked above_) or the [wiki](https://github.com/pszemraj/textsum/wiki).
580
 
581
  * * *
582
 
 
584
 
585
  ### Updates:
586
 
587
+ - July 22, 2022: updated to a fairly converged checkpoint
588
+ - July 3, 2022: Added a new version with several epochs of additional general training that is more performant.
589
 
590
  ### Training hyperparameters
591
 
 
593
 
594
  The following hyperparameters were used during the **most recent** training round\*:
595
 
596
+ - learning_rate: 0.0005
597
+ - train_batch_size: 1
598
+ - eval_batch_size: 1
599
+ - seed: 42
600
+ - distributed_type: multi-GPU
601
+ - gradient_accumulation_steps: 128
602
+ - total_train_batch_size: 128
603
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
604
+ - lr_scheduler_type: cosine
605
+ - lr_scheduler_warmup_ratio: 0.01
606
+ - num_epochs: 2
607
 
608
  \* Prior training sessions used roughly similar parameters; multiple sessions were required as this takes eons to train
609
 
610
  ### Framework versions
611
 
612
+ - Transformers 4.20.1
613
+ - Pytorch 1.10.0+cu113
614
+ - Datasets 2.3.2
615
+ - Tokenizers 0.12.1
616
 
617
  ## Citation info
618