Update README.md
Browse files
README.md
CHANGED
@@ -30,6 +30,7 @@ A summary of the [infamous navy seals copypasta](https://knowyourmeme.com/memes/
|
|
30 |
|
31 |
> In this chapter, the monster explains how he intends to exact revenge on "the little b****" who insulted him. He tells the kiddo that he is a highly trained and experienced killer who will use his arsenal of weapons--including his access to the internet--to exact justice on the little brat.
|
32 |
|
|
|
33 |
|
34 |
---
|
35 |
|
@@ -72,14 +73,14 @@ Pass [other parameters related to beam search textgen](https://huggingface.co/bl
|
|
72 |
|
73 |
While this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
|
74 |
|
75 |
-
Specifically: negation statements (i.e
|
76 |
- I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually fact-check this by comparing a specific claim to what the surrounding sentences imply.
|
77 |
|
78 |
### Training and evaluation data
|
79 |
|
80 |
`kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).
|
81 |
|
82 |
-
- **Initial fine-tuning** only used input text with 12288 tokens input or less and 1024 tokens output or less (_i.e. rows with longer were dropped
|
83 |
- In addition, this initial training combined the training and validation sets and trained on these in aggregate to increase the functional dataset size. **Therefore, take the validation set results with a grain of salt; primary metrics should be (always) the test set.**
|
84 |
- **final phases of fine-tuning** used the standard conventions of 16384 input/1024 output keeping everything (truncating longer sequences). This did not appear to change the loss/performance much.
|
85 |
|
|
|
30 |
|
31 |
> In this chapter, the monster explains how he intends to exact revenge on "the little b****" who insulted him. He tells the kiddo that he is a highly trained and experienced killer who will use his arsenal of weapons--including his access to the internet--to exact justice on the little brat.
|
32 |
|
33 |
+
While a somewhat crude example, try running this copypasta through other summarization models to see the difference in comprehension (_despite it not even being a "long" text!_)
|
34 |
|
35 |
---
|
36 |
|
|
|
73 |
|
74 |
While this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
|
75 |
|
76 |
+
Specifically: negation statements (i.e., model says: _This thing does not have [ATTRIBUTE]_ where instead it should have said _This thing has a lot of [ATTRIBUTE]_).
|
77 |
- I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually fact-check this by comparing a specific claim to what the surrounding sentences imply.
|
78 |
|
79 |
### Training and evaluation data
|
80 |
|
81 |
`kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).
|
82 |
|
83 |
+
- **Initial fine-tuning** only used input text with 12288 tokens input or less and 1024 tokens output or less (_i.e. rows with longer were dropped before training_) for memory reasons. Per brief analysis, summaries in the 12288-16384 range in this dataset are in the **small** minority
|
84 |
- In addition, this initial training combined the training and validation sets and trained on these in aggregate to increase the functional dataset size. **Therefore, take the validation set results with a grain of salt; primary metrics should be (always) the test set.**
|
85 |
- **final phases of fine-tuning** used the standard conventions of 16384 input/1024 output keeping everything (truncating longer sequences). This did not appear to change the loss/performance much.
|
86 |
|