loubnabnl HF staff commited on
Commit
f73ebb6
Β·
1 Parent(s): 80ad08e

fix #tokens

Browse files
Files changed (1) hide show
  1. index.html +1 -1
index.html CHANGED
@@ -704,7 +704,7 @@
704
  </figure>
705
  <div id="plot-edu-8k"></div>
706
  </div>
707
- <p>We then built πŸ“š FineWeb-Edu by filtering out samples with scores lower than 3. This removed 92% of the dataset, leaving us with 1.2T educational tokens. To evaluate the effectiveness of this filtering at a larger scale, we conducted an ablation using a 1.82B model trained on 350 billion tokens, similar to the FineWeb filtering ablation mentioned above:</p>
708
  <div class="main-plot-container">
709
  <figure>
710
  <img src="plots/edu-100k.png">
 
704
  </figure>
705
  <div id="plot-edu-8k"></div>
706
  </div>
707
+ <p>We then built πŸ“š FineWeb-Edu by filtering out samples with scores lower than 3. This removed 92% of the dataset, leaving us with 1.3 trillion educational tokens. To evaluate the effectiveness of this filtering at a larger scale, we conducted an ablation using a 1.82B model trained on 350 billion tokens, similar to the FineWeb filtering ablation mentioned above:</p>
708
  <div class="main-plot-container">
709
  <figure>
710
  <img src="plots/edu-100k.png">