SivilTaram
commited on
Commit
•
8004929
1
Parent(s):
8fcd306
Update README.md
Browse files
README.md
CHANGED
@@ -5,7 +5,7 @@ language:
|
|
5 |
- en
|
6 |
---
|
7 |
|
8 |
-
The pre-trained 3B model with the vocabulary size 43K in the paper Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
|
9 |
impacts language model scaling law in this paper.
|
10 |
|
11 |
Based on our approach, we predict the optimal vocabulary size for 3B model is about 43K.
|
|
|
5 |
- en
|
6 |
---
|
7 |
|
8 |
+
The pre-trained 3B model with the vocabulary size 43K in the paper [Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies](https://huggingface.co/papers/2407.13623). We investigate how vocabulary size
|
9 |
impacts language model scaling law in this paper.
|
10 |
|
11 |
Based on our approach, we predict the optimal vocabulary size for 3B model is about 43K.
|