Invite for discussion - Vocab extension

#1
by Ali-C137 - opened

Hello @smangrul , sorry but i tried to reach out to you on Twitter & LinkedIn but couldn't, so i figured to write to you here.
Well, i'am leading a project on Arabic LLMs (similar to what you guys are doing for Hinglish/Hindi) and would love to discuss with you further the point of tokenizer's vocabulary extension, i searched and apparently the only way is manually but it's okay i can exctract the vocab from "https://huggingface.co/core42/jais-13b" for example and add it to my target model (Mistral, Llama2, Falcon ...etc)
But before to do that i'am interested to discuss like how could effect the training (using LoRA) and inference and so on ...
Would love to hear from you

Sign up or log in to comment