MorRoBERTa

MorRoBERTa is a Transformer-based Language Model designed specifically for the Moroccan Dialect. Developed by Moussaoui Otman and El Younoussi Yacine.

About MorRoBERTa

MorRoBERTa, designed specifically for the Moroccan dialect, is a scaled-down variant of the RoBERTa-base model. It comprises 6 layers, 12 attention heads, and 768 hidden dimensions. The training process spanned approximately 92 hours, covering 12 epochs on the complete training set. A vast corpus of six million Moroccan dialect sentences, amounting to 71 billion tokens, was used to train this model.

Usage

The model weights can be loaded using transformers library by HuggingFace.

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("otmangi/MorRoBERTa")

model = AutoModel.from_pretrained("otmangi/MorRoBERTa")

Acknowledgments

This research was supported through computational resources of HPC-MARWAN (www.marwan.ma/hpc) provided by the National Center for Scientific and Technical Research (CNRST). Rabat. Morocco.

Contact

For any inquiries, feedback, or requests, please feel free to reach out to :

otman.moussaoui@etu.uae.ac.ma

yacine.elyounoussi@uae.ac.ma