fix #tokens
Browse files- index.html +1 -1
index.html
CHANGED
@@ -704,7 +704,7 @@
|
|
704 |
</figure>
|
705 |
<div id="plot-edu-8k"></div>
|
706 |
</div>
|
707 |
-
<p>We then built π FineWeb-Edu by filtering out samples with scores lower than 3. This removed 92% of the dataset, leaving us with 1.
|
708 |
<div class="main-plot-container">
|
709 |
<figure>
|
710 |
<img src="plots/edu-100k.png">
|
|
|
704 |
</figure>
|
705 |
<div id="plot-edu-8k"></div>
|
706 |
</div>
|
707 |
+
<p>We then built π FineWeb-Edu by filtering out samples with scores lower than 3. This removed 92% of the dataset, leaving us with 1.3 trillion educational tokens. To evaluate the effectiveness of this filtering at a larger scale, we conducted an ablation using a 1.82B model trained on 350 billion tokens, similar to the FineWeb filtering ablation mentioned above:</p>
|
708 |
<div class="main-plot-container">
|
709 |
<figure>
|
710 |
<img src="plots/edu-100k.png">
|