Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
datasets:
|
3 |
+
- cerebras/SlimPajama-627B
|
4 |
+
language:
|
5 |
+
- en
|
6 |
+
---
|
7 |
+
|
8 |
+
The pre-trained 3B model with the vocabulary size 43K in the paper ``Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies''. We investigate how vocabulary size
|
9 |
+
impacts language model scaling law in this paper. Based on our approach, we predict the optimal vocabulary size for 3B model is about 43K.
|
10 |
+
Then, we train a Llama-based 3B model on a sampled version Slimpajama datasets. The model with 43K vocabulary outperforms the model with the common vocabulary size, 32K, despite using fewer training tokens.
|
11 |
+
It is noteworthy that the proposed approach can be used for different model sizes.
|