Is eos_token got fixed?

#1
by Starlento - opened

The huggingface page still shows the eos_token_id for the models is 128001 (not 128009).
And I downloaded q8_0 and use tgwebui to check the tokenizer:
It is still wrong, maybe this is not the right way to test special tokens?

<|eot_id|>
27     -  '<'
91     -  '|'
68     -  'e'
354    -  'ot'
851    -  '_id'
91     -  '|'
29     -  '>'
LM Studio Community org

I have a feeling you aren't on the latest TGWUI, using llama.cpp I see this:

128009 -> '<|eot_id|>'
128006 -> '<|start_header_id|>'
882 -> 'user'
128007 -> '<|end_header_id|>'
271 -> '

'
3923 -> 'What'

LM Studio Community org

(or TGWUI isn't on latest llama-cpp, they've been a bit slow to update that one lately)

LM Studio Community org

Yeah just looked it up, latest text-gen-webui is on 0.2.65, which is from before the BPE tokenizer fix. You can try dev branch which is updated or LM Studio

Starlento changed discussion status to closed

Sign up or log in to comment