Update README.md
Browse files
README.md
CHANGED
@@ -76,7 +76,7 @@ Around 100B tokens from a mixture of the following corpora were used for the con
|
|
76 |
- [Japanese mc4](https://huggingface.co/datasets/mc4)
|
77 |
- [Japanese CC-100](http://data.statmt.org/cc-100/ja.txt.xz)
|
78 |
- [Japanese OSCAR](https://oscar-project.github.io/documentation/)
|
79 |
-
- [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B)
|
80 |
|
81 |
|
82 |
## Use and Limitations
|
|
|
76 |
- [Japanese mc4](https://huggingface.co/datasets/mc4)
|
77 |
- [Japanese CC-100](http://data.statmt.org/cc-100/ja.txt.xz)
|
78 |
- [Japanese OSCAR](https://oscar-project.github.io/documentation/)
|
79 |
+
- [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B) without the Books3 subset
|
80 |
|
81 |
|
82 |
## Use and Limitations
|