Tokenizer differences from v1

#3
by bartowski - opened

Any reason that this one is missing the tokens in tokenizer_config.json for FIM/file separator etc that v1 had?

Hey, @bartowski , thanks for bringing this to our attention. We merged a fix for it: https://huggingface.co/google/codegemma-1.1-7b-it/discussions/4

TL;DR - the conversion scripts for the transformer's equivalent models had some issues, and some tokens went missing.

Let us know if you face any issues. 🤗

Sign up or log in to comment