Edit model card
drawing

DziriBERT

DziriBERT is the first Transformer-based Language Model that has been pre-trained specifically for the Algerian Dialect. It handles Algerian text contents written using both Arabic and Latin characters. It sets new state of the art results on Algerian text classification datasets, even if it has been pre-trained on much less data (~1 million tweets).

For more information, please visit our paper: https://arxiv.org/pdf/2109.12346.pdf.

How to use

from transformers import BertTokenizer, BertForMaskedLM

tokenizer = BertTokenizer.from_pretrained("alger-ia/dziribert")
model = BertForMaskedLM.from_pretrained("alger-ia/dziribert")

You can find a fine-tuning script in our Github repo: https://github.com/alger-ia/dziribert

How to cite

@article{dziribert,
  title={DziriBERT: a Pre-trained Language Model for the Algerian Dialect},
  author={Abdaoui, Amine and Berrimi, Mohamed and Oussalah, Mourad and Moussaoui, Abdelouahab},
  journal={arXiv preprint arXiv:2109.12346},
  year={2021}
}

Contact

Please contact amine.abdaoui@huawei.com for any question, feedback or request.

Downloads last month
894
Hosted inference API
Fill-Mask
Examples
Examples
Mask token: [MASK]
This model can be loaded on the Inference API on-demand.