Question about the new tokenizer.

#9
by hyunseoki - opened

Thank you for your great work!!

I'm curious that how did you produce the new tokenizer which added new Korean vocabulary.
I wonder new Korean vocabulary may be exist duplicates with original llama tokenizer's.

New Korean vocab does not duplicated with original Llama tokenizer since I used add_new_vocab method in Tokenizers, which explicitly prevents adding pre-existing vocab.

beomi changed discussion status to closed

Sign up or log in to comment