Edit model card

darija-ner

This is the first model for Named Entity Recognition (NER) in the Moroccan dialect (Darija). The model was trained on the very first NER dataset in Darija, DarNERcorp, that can be found on Mendeley https://data.mendeley.com/datasets/286sss4k9v/4.

Model Description

  • Developed by: Hanane Nour Moussa
  • Model type: Token classification
  • Language(s) (NLP): Arabic, Darija

Model Sources

Metrics

F1 score.

Results

DarNERcorp_test: F1 = 66.06%

MixedNERcorp_test: F1 = 70.06%

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: NVIDIA T4
  • Hours used: 0.7
  • Cloud Provider: Google Cloud
  • Compute Region: europe-west1
  • Carbon Emitted: 0.01 kg

Citation

If you use DarNERcorp dataset to train your models, cite the following paper:

Hanane Nour Moussa, Asmaa Mourhir, DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect, Data in Brief, Volume 48, 2023, 109234, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2023.109234. (https://www.sciencedirect.com/science/article/pii/S2352340923003530)

GitHub Repo:

Our data curation and model traning code is openly available on GitHub: https://github.com/HananeNourMoussa/darija-ner

Downloads last month
27