Use XLM-R to build LM on X-languages (X>=5) from scratch
How to the XLM-R of HF train on our own languages, from scratch? The documentation is not super clear about it. I was working mainly with this (https://github.com/facebookresearch/XLM) but it is complex enough for my purpose.
Hey @bonadossue,
In general, I would not recommend to train XLM-R from scratch as it has been pretrained on all kinds of languages and one should be able to just fine-tune it on your preferred language. If you really want to run a whole pretraining though, I'd recommend the following example: https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_mlm.py
Which languages are you mostly interested in?
Many languages like Fon, Ghomala, Bambara, etc
Ok did you try just fine-tuning XLM-R on those languages?