EOS / BOS - odd behavior wrt start/end of turn tokens

#1
by wolfspyre - opened

in the original model the EOS/BOS token are set in config.json:
https://huggingface.co/migtissera/Tess-2.0-Yi-34B-200K/blob/9dc20c3e0d2cae5e390980a7a2a34f289bec73b3/config.json#L8-L9
with corresponding tokens added to the added_tokens.json file:
https://huggingface.co/migtissera/Tess-2.0-Yi-34B-200K/blob/main/added_tokens.json
I'm not sure if the same was used while quantizing .... but I'm noticing an odd behavior where (at least with the Q6 quant) I'm seeing it emitting
<|START_of_TURN_TOKEN|> versus <|START_OF_TURN_TOKEN|>
or <|END_of_TURN_TOKEN|> and <|END_OF_TURN_TOKEN|>

what was peculiar, (to me) was it emitting <|START_of_TURN_TOKEN|>... <|END_OF_TURN_TOKEN|>

IE: in the same inference response, one had 'of' lowercased... the other had it uppercased.

any ideas as to what might cause this?

I don't know the inner logic of llama.cpp's conversion, but if a model has enough tokens already, it will not use added_tokens.json. And unless the behaviour really doesn't happen with the original model, it might not even be an artifact of quantisation (LLMs can emit lowercase versions of tokens, of course). Having said that, a lot of models have broken tokenizer configs. Nothing much I can do about it, unless there multiple vocabularly options and one works better than the other, or this is a bug in llama.cpp with a fix, in which case I could regenerate.

mradermacher changed discussion status to closed

Sign up or log in to comment