sail
/

scaling-vocab-3b-32k-overtrain

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

tcftrees commited on Jun 18, 2024

Commit

3ab65c5

•

1 Parent(s): 13ff54b

Create README.md

Files changed (1) hide show

README.md +11 -0

README.md ADDED Viewed

	@@ -0,0 +1,11 @@

+---
+datasets:
+- cerebras/SlimPajama-627B
+language:
+- en
+---
+The pre-trained 3B model with the vocabulary size 43K in the paper ``Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies''.  We investigate how vocabulary size
+impacts language model scaling law in this paper. Based on our approach, we predict the optimal vocabulary size for 3B model is about 43K.
+Then, we train a Llama-based 3B model on a sampled version Slimpajama datasets. The model with 43K vocabulary outperforms the model with the common vocabulary size, 32K, despite using fewer training tokens.
+It is noteworthy that the proposed approach can be used for different model sizes.