Add a note about padded vocab size during training

#43
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -193,6 +193,8 @@ The BLOOM tokenizer ([link](https://huggingface.co/bigscience/tokenizer)) is a l
193
 
194
  - A vocabulary size of 250,680
195
 
 
 
196
  It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language.
197
 
198
  </details>
 
193
 
194
  - A vocabulary size of 250,680
195
 
196
+ The vocabulary size was padded to 250,880 for practical purposes during training, but the effective model vocabulary size is 250,680.
197
+
198
  It was trained on a subset of a preliminary version of the corpus using alpha-weighting per language.
199
 
200
  </details>