Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@ datasets:
|
|
6 |
|
7 |
# cosmo2-tokenizer
|
8 |
Tokenizer for the training of cosmo2. This tokenizer was trained on 1M samples from:
|
9 |
-
- FineWeb-Edu
|
10 |
- Cosmopedia v2 15%
|
11 |
- StarCoderData 8%
|
12 |
- OpenWebMath 5%
|
|
|
6 |
|
7 |
# cosmo2-tokenizer
|
8 |
Tokenizer for the training of cosmo2. This tokenizer was trained on 1M samples from:
|
9 |
+
- FineWeb-Edu 70%
|
10 |
- Cosmopedia v2 15%
|
11 |
- StarCoderData 8%
|
12 |
- OpenWebMath 5%
|