Why vocab size is 32032? (Extra 30 tokens)

#3
by theodotus - opened

Is it a typo?
Should be 32002

theodotus changed discussion title from Why vocab size is 32032 to Why vocab size is 32032? (Extra 30 tokens)
Weights and Biases org

We use ChatML format, so we have to add 2 extra tokens to the tokenizer:
https://huggingface.co/wandb/mistral-7b-zephyr-dpo/blob/main/tokenizer_config.json

tcapelle changed discussion status to closed

Sign up or log in to comment