Is eos_token got fixed?
#1
by
Starlento
- opened
The huggingface page still shows the eos_token_id for the models is 128001 (not 128009).
And I downloaded q8_0 and use tgwebui to check the tokenizer:
It is still wrong, maybe this is not the right way to test special tokens?
<|eot_id|>
27 - '<'
91 - '|'
68 - 'e'
354 - 'ot'
851 - '_id'
91 - '|'
29 - '>'
I have a feeling you aren't on the latest TGWUI, using llama.cpp I see this:
128009 -> '<|eot_id|>'
128006 -> '<|start_header_id|>'
882 -> 'user'
128007 -> '<|end_header_id|>'
271 -> '
'
3923 -> 'What'
(or TGWUI isn't on latest llama-cpp, they've been a bit slow to update that one lately)
Yeah just looked it up, latest text-gen-webui is on 0.2.65, which is from before the BPE tokenizer fix. You can try dev branch which is updated or LM Studio
Starlento
changed discussion status to
closed