v3 tokenizer
Hi,
I just wanted to let you know that the mistral repo contains a file called:
tokenizer.model.v3
It is my understanding that this is the new tokenizer that contains the expanded vocabulary.
However, when making the gguf, I think it needs to be renamed first to tokenizer.model
or else it might be ignored by the convert script.
You might already know all of this though, so feel free to ignore :)
More info about v3 here: https://docs.mistral.ai/guides/tokenization/
I was concerned how renaming it to tokenizer.model would react with GGUF, I can try remaking the conversion with that included to see if that works though
No problem, it worked for me but I only prompted it a few times.
after my current quantization is done i'll remake this one into a new repo with tokenizerV3 just incase there are differences that cause unexpected breaks
That's great to hear! Thank you @bartowski