<turn> and <sep> are not tokens

#8
by artek0chumak - opened

Hi! I tried to use the model and found strange tokenizer behaviour.
For the " " text tokenizer returns "[3, 2, 7, 15, 102, 3155, 3, 1]", that can be detokenized to "sep>". The same goes with "", the tokens are "[3, 2, 7535, 3155, 3, 1]" and "turn>". Is this tokenization expected?
I used the conda environment from https://github.com/skywalker023/sodaverse .

Sign up or log in to comment