Added token
#5
by
zokica
- opened
Hi,
How did you add these tokens:
"<|im_end|>": 50295,
"<|im_start|>": 50296,
"<|startoftext|>": 50297
Hey @zokica ,
They are added as part of the first training step - the process is to add any new tokens to the the tokenizer, noting if they're a special token such as eos_token etc. If the tokenizer size is increased, it's also necessary to save the lm_head and embed_tokens modules as well.
There is a detailed blog post on this here and if you're using a framework such as Axolotl then you can easily state which new tokens you're adding in your config file.