Tokenizer?

#20
by LoadingALIAS - opened

I'm wondering what tokenizer was used for StarCoder-2 15b? I know v1 used the GPT2 (50k) tokenizer. I'm hoping this uses the GPT3-4 (100k) tokenizer. Can anyone answer this for me?

Thank you!

BigCode org

@LoadingALIAS You can find the tokenizer at https://huggingface.co/bigcode/starcoder2-15b/blob/main/tokenizer_config.json. It does not use the GPT4 tokenizer.

Sign up or log in to comment