Finetune nllb for new language

#17
by bojosted - opened

Hello,
I am finetuning nllb-200 for a new low corpora language.
For the new language to be recognized by the tokenizer, i'm modifying the source code of the library (tokenisation_nllb.py) by adding the token of my new language to the ""FAIRSEQ_LANGUAGE_CODES" list. After that, I load the nllb tokenizer with the additional_special_token=["new_token"].
Is there any library included function/attribute to do it directly without modifying source code ?
Thanks for your help !

Hello ,
do you the answer of your question cuz i have the project for african language.

Thanks

Sign up or log in to comment