Question Tokenizer

#7
by nebchi - opened

Thank you for creating a great model. In the previous Llama2, you expanded the vocabulary. Is there a reason why you didn't do that separately in Llama3?

I think because llama3 tokenizer already has some good amount of korean tokens. I don't know how much Korean tokens are in the tokenizer exactly, but if you write system prompt that reply to korean, you can see llama3 replying Korean.

Thank you for the good answer. It seems that, similar to Gemma, it has some level of Korean language training, so no additional training was needed.

Sign up or log in to comment