Tokens overrides (added_tokens_decoder)

#1
by dranger003 - opened

Hey there, have you been able to apply the token overrides for tokens 106/107 (i.e. <|im_start|> and <|im_end|>)?
What does it looks like when you print the token IDs from tokenizing the template?

EDIT: I was able to get them mapped and properly decoded but I had to edit convert-hf-to-gguf.py to a different _set_vocab().

If you have a diff or PR I can apply and rerun the quants, I'd appreciate it. I have not followed the token override issue you've mentioned above.

I just put in an issue with the details. I don't think this is non-trivial to fix and if I get some guidance I can craft a PR.

Sign up or log in to comment