Finetune nllb for new language

#17

by bojosted - opened Jun 7, 2023

Jun 7, 2023

Hello,
I am finetuning nllb-200 for a new low corpora language.
For the new language to be recognized by the tokenizer, i'm modifying the source code of the library (tokenisation_nllb.py) by adding the token of my new language to the ""FAIRSEQ_LANGUAGE_CODES" list. After that, I load the nllb tokenizer with the additional_special_token=["new_token"].
Is there any library included function/attribute to do it directly without modifying source code ?
Thanks for your help !

alphai

Sep 20, 2023

Hello ,
do you the answer of your question cuz i have the project for african language.

Thanks

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment