MorRoBERTa
MorRoBERTa, designed specifically for the Moroccan Arabic dialect, is a scaled-down variant of the RoBERTa-base model. It comprises 6 layers, 12 attention heads, and 768 hidden dimensions. The training process spanned approximately 92 hours, covering 12 epochs on the complete training set. A vast corpus of six million Moroccan dialect sentences, amounting to 71 billion tokens, was used to train this model.
Usage
The model weights can be loaded using transformers library by HuggingFace.
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("otmangi/MorRoBERTa")
model = AutoModel.from_pretrained("otmangi/MorRoBERTa")