<|im_start|> missing status of special token

#4
by xxxTEMPESTxxx - opened

Correct me if i am wrong but

"50296": {
      "content": "<|im_start|>",
      "lstrip": false,
      "normalized": false,
      "rstrip": false,
      "single_word": false,
      "special": false

from what i have seen with other models using open ai chat format , im_start is given a special token status to tokenizer which is missing on all phi-2 finetunes that are enforcing chatml format , quite starnge as it's basically making model's life harder

Sign up or log in to comment