Pablogps commited on
Commit
17ecec6
1 Parent(s): e1bf88a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -89,7 +89,7 @@ We adjusted the `factor` parameter of the `Stepwise` function, and the `factor`
89
 
90
  <figure>
91
 
92
- ![](./images/perp-resample-stepwise.png)
93
 
94
  <caption>Figure 3. Expected perplexity distributions of the sample mc4-es after applying the Stepwise function.</caption>
95
  </figure>
@@ -139,7 +139,7 @@ Although this is not a comprehensive analysis, we looked into the distribution o
139
 
140
  ### Training details
141
 
142
- We then used the same setup and hyperparameters as [Liu et al. (2019)](https://arxiv.org/abs/1907.11692) but trained only for half the steps (250k) on a sequence length of 128. In particular, `Gaussian` trained for the 250k steps, while `Random` was stopped at 230k and `Stepwise` at 180k (this was a decision based on an analysis of training performance and the computational resources available at the time).
143
 
144
  Then, we continued training the most promising model for a few steps (~25k) more on sequence length 512. We tried two strategies for this, since it is not easy to find clear details about this change in the literature. It turns out this decision had a big impact in the final performance.
145
 
 
89
 
90
  <figure>
91
 
92
+ ![](./images/perp-resample.png)
93
 
94
  <caption>Figure 3. Expected perplexity distributions of the sample mc4-es after applying the Stepwise function.</caption>
95
  </figure>
 
139
 
140
  ### Training details
141
 
142
+ We then used the same setup and hyperparameters as [Liu et al. (2019)](https://arxiv.org/abs/1907.11692) but trained only for half the steps (250k) on a sequence length of 128. In particular, `Gaussian` trained for the 250k steps, while `Random` was stopped at 230k. `Stepwise` needed to be initially stopped at 180k to allow downstream tests (sequence length 128), but was later resumed. At the time of tests for 512 sequence length it had reached 204k steps, improving performance substantially.
143
 
144
  Then, we continued training the most promising model for a few steps (~25k) more on sequence length 512. We tried two strategies for this, since it is not easy to find clear details about this change in the literature. It turns out this decision had a big impact in the final performance.
145