pszemraj commited on
Commit
cd493ca
1 Parent(s): 09901c4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -8
README.md CHANGED
@@ -19,10 +19,10 @@ inference: False
19
  # long-t5-tglobal-xl + BookSum
20
 
21
  Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
22
- - generalizes reasonably well to academic & narrative text.
23
  - This is the XL checkpoint, which **from a human-evaluation perspective, produces even better summaries**.
24
 
25
- A simple example/use case with the `base` model on ASR is [here](https://longt5-booksum-example.netlify.app/).
26
 
27
  ## Model description
28
 
@@ -32,7 +32,7 @@ Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer f
32
 
33
  ## How-To in Python
34
 
35
- > `LLM.int8()` appears to be compatible with summarization and does not degrade the quality of the outputs; this is a crucial enabler for using this model on standard GPUs. A PR for this is in-progress [here](https://github.com/huggingface/transformers/pull/20341), and this model card will be updated with instructions once done :)
36
 
37
  Install/update transformers `pip install -U transformers`
38
 
@@ -60,22 +60,22 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
60
 
61
  While this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
62
 
63
- specifically: negation statements (i.e. model says: _This thing does not have <ATTRIBUTE>_ where instead it should have said _This thing has a lot of <ATTRIBUTE>_).
64
- - I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually fact-check this by paying attention to the surrounding sentences of a claim by the model.
65
 
66
  ## Training and evaluation data
67
 
68
  `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).
69
 
70
- - **Initial fine-tuning** only used input text with 12288 tokens input or less and 1024 tokens output or less for memory reasons. Per brief analysis, summaries in the 12288-16384 range in this dataset are in the **small** minority
71
  - In addition, this initial training combined the training and validation sets and trained on these in aggregate to increase the functional dataset size. **Therefore, take the validation set results with a grain of salt; primary metrics should be (always) the test set.**
72
  - **final phases of fine-tuning** used the standard conventions of 16384 input/1024 output keeping everything (truncating longer sequences). This did not appear to change the loss/performance much.
73
 
74
- ## Eval Results
75
 
76
  Official results with the [model evaluator](https://huggingface.co/spaces/autoevaluate/model-evaluator) will be computed and posted here.
77
 
78
- **Please read the note above as due to training methods, it looks better than the test set results will be**. The model achieves the following results on the evaluation set:
79
  - eval_loss: 1.2756
80
  - eval_rouge1: 41.8013
81
  - eval_rouge2: 12.0895
@@ -128,3 +128,4 @@ The following hyperparameters were used during training:
128
  - Datasets 2.6.1
129
  - Tokenizers 0.13.1
130
 
 
 
19
  # long-t5-tglobal-xl + BookSum
20
 
21
  Summarize long text and get a SparkNotes-esque summary of arbitrary topics!
22
+ - Generalizes reasonably well to academic & narrative text.
23
  - This is the XL checkpoint, which **from a human-evaluation perspective, produces even better summaries**.
24
 
25
+ A simple example/use case with [the base model](https://huggingface.co/pszemraj/long-t5-tglobal-base-16384-book-summary) on ASR is [here](https://longt5-booksum-example.netlify.app/).
26
 
27
  ## Model description
28
 
 
32
 
33
  ## How-To in Python
34
 
35
+ > 🚧 `LLM.int8()` appears to be compatible with summarization and does not degrade the quality of the outputs; this is a crucial enabler for using this model on standard GPUs. A PR for this is in-progress [here](https://github.com/huggingface/transformers/pull/20341), and this model card will be updated with instructions once done :) 🚧
36
 
37
  Install/update transformers `pip install -U transformers`
38
 
 
60
 
61
  While this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
62
 
63
+ Specifically: negation statements (i.e. model says: _This thing does not have [ATTRIBUTE]_ where instead it should have said _This thing has a lot of [ATTRIBUTE]_).
64
+ - I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually fact-check this by comparing a specific claim to what the surrounding sentences imply.
65
 
66
  ## Training and evaluation data
67
 
68
  `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).
69
 
70
+ - **Initial fine-tuning** only used input text with 12288 tokens input or less and 1024 tokens output or less (_i.e. rows with longer were dropped prior to train_) for memory reasons. Per brief analysis, summaries in the 12288-16384 range in this dataset are in the **small** minority
71
  - In addition, this initial training combined the training and validation sets and trained on these in aggregate to increase the functional dataset size. **Therefore, take the validation set results with a grain of salt; primary metrics should be (always) the test set.**
72
  - **final phases of fine-tuning** used the standard conventions of 16384 input/1024 output keeping everything (truncating longer sequences). This did not appear to change the loss/performance much.
73
 
74
+ ## Eval results
75
 
76
  Official results with the [model evaluator](https://huggingface.co/spaces/autoevaluate/model-evaluator) will be computed and posted here.
77
 
78
+ **Please read the note above as due to training methods, validation set performance looks better than the test set results will be**. The model achieves the following results on the evaluation set:
79
  - eval_loss: 1.2756
80
  - eval_rouge1: 41.8013
81
  - eval_rouge2: 12.0895
 
128
  - Datasets 2.6.1
129
  - Tokenizers 0.13.1
130
 
131
+ ---