pszemraj
/

long-t5-tglobal-xl-16384-book-summary

@@ -30,6 +30,7 @@ A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/
 > In this chapter, the monster explains how he intends to exact revenge on "the little b****" who insulted him. He tells the kiddo that he is a highly trained and experienced killer who will use his arsenal of weapons--including his access to the internet--to exact justice on the little brat.
 ---
@@ -72,14 +73,14 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
 While this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
-Specifically: negation statements (i.e. model says: _This thing does not have [ATTRIBUTE]_ where instead it should have said _This thing has a lot of [ATTRIBUTE]_).
 - I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually fact-check this by comparing a specific claim to what the surrounding sentences imply.
 ### Training and evaluation data
 `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).
-- **Initial fine-tuning** only used input text with 12288 tokens input or less and 1024 tokens output or less (_i.e. rows with longer were dropped prior to train_) for memory reasons. Per brief analysis, summaries in the 12288-16384 range in this dataset are in the **small** minority
   - In addition, this initial training combined the training and validation sets and trained on these in aggregate to increase the functional dataset size. **Therefore, take the validation set results with a grain of salt; primary metrics should be (always) the test set.**
 - **final phases of fine-tuning** used the standard conventions of 16384 input/1024 output keeping everything (truncating longer sequences). This did not appear to change the loss/performance much.

 > In this chapter, the monster explains how he intends to exact revenge on "the little b****" who insulted him. He tells the kiddo that he is a highly trained and experienced killer who will use his arsenal of weapons--including his access to the internet--to exact justice on the little brat.
+While a somewhat crude example, try running this copypasta through other summarization models to see the difference in comprehension (_despite it not even being a "long" text!_)
 ---
 While this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
+Specifically: negation statements (i.e., model says: _This thing does not have [ATTRIBUTE]_ where instead it should have said _This thing has a lot of [ATTRIBUTE]_).
 - I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually fact-check this by comparing a specific claim to what the surrounding sentences imply.
 ### Training and evaluation data
 `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).
+- **Initial fine-tuning** only used input text with 12288 tokens input or less and 1024 tokens output or less (_i.e. rows with longer were dropped before training_) for memory reasons. Per brief analysis, summaries in the 12288-16384 range in this dataset are in the **small** minority
   - In addition, this initial training combined the training and validation sets and trained on these in aggregate to increase the functional dataset size. **Therefore, take the validation set results with a grain of salt; primary metrics should be (always) the test set.**
 - **final phases of fine-tuning** used the standard conventions of 16384 input/1024 output keeping everything (truncating longer sequences). This did not appear to change the loss/performance much.