tcftrees commited on
Commit
3ab65c5
1 Parent(s): 13ff54b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -0
README.md ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - cerebras/SlimPajama-627B
4
+ language:
5
+ - en
6
+ ---
7
+
8
+ The pre-trained 3B model with the vocabulary size 43K in the paper ``Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies''. We investigate how vocabulary size
9
+ impacts language model scaling law in this paper. Based on our approach, we predict the optimal vocabulary size for 3B model is about 43K.
10
+ Then, we train a Llama-based 3B model on a sampled version Slimpajama datasets. The model with 43K vocabulary outperforms the model with the common vocabulary size, 32K, despite using fewer training tokens.
11
+ It is noteworthy that the proposed approach can be used for different model sizes.