Update config.json
Browse filesThe tokenizer vocab size for CodeLLaMa 13b, 7b got expanded w/ infill tokens (see [research paper pg 4](https://scontent-ord5-2.xx.fbcdn.net/v/t39.2365-6/369856151_1754812304950972_1159666448927483931_n.pdf?_nc_cat=107&ccb=1-7&_nc_sid=3c67a6&_nc_ohc=BnkB4kcpz5AAX-c3uM0&_nc_ht=scontent-ord5-2.xx&oh=00_AfBP_tb05ucf_93TwMg69_ktriBrsRQ3Dd-UxIadZJYGsA&oe=64ECB20F)). I checked and the new vocab size is 32,016. Inference works fine w/ the incorrect count but PEFT training requires the vocab size to be right
- config.json +1 -1
config.json
CHANGED
@@ -20,5 +20,5 @@
|
|
20 |
"torch_dtype": "float16",
|
21 |
"transformers_version": "4.32.0",
|
22 |
"use_cache": true,
|
23 |
-
"vocab_size":
|
24 |
}
|
|
|
20 |
"torch_dtype": "float16",
|
21 |
"transformers_version": "4.32.0",
|
22 |
"use_cache": true,
|
23 |
+
"vocab_size": 32016
|
24 |
}
|