Use XLM-R to build LM on X-languages (X>=5) from scratch

#2
by bonadossou - opened

How to the XLM-R of HF train on our own languages, from scratch? The documentation is not super clear about it. I was working mainly with this (https://github.com/facebookresearch/XLM) but it is complex enough for my purpose.

Hey @bonadossue,

In general, I would not recommend to train XLM-R from scratch as it has been pretrained on all kinds of languages and one should be able to just fine-tune it on your preferred language. If you really want to run a whole pretraining though, I'd recommend the following example: https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_mlm.py

Which languages are you mostly interested in?

Many languages like Fon, Ghomala, Bambara, etc

Ok did you try just fine-tuning XLM-R on those languages?

Sign up or log in to comment