Tokens overrides (added_tokens_decoder)

by dranger003 - opened Mar 2

Mar 2

•

Hey there, have you been able to apply the token overrides for tokens 106/107 (i.e. <|im_start|> and <|im_end|>)?
What does it looks like when you print the token IDs from tokenizing the template?

EDIT: I was able to get them mapped and properly decoded but I had to edit convert-hf-to-gguf.py to a different _set_vocab().

LoneStriker

Owner Mar 2

If you have a diff or PR I can apply and rerun the quants, I'd appreciate it. I have not followed the token override issue you've mentioned above.

dranger003

Mar 2

I just put in an issue with the details. I don't think this is non-trivial to fix and if I get some guidance I can craft a PR.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment