darija-ner
This is the first model for Named Entity Recognition (NER) in the Moroccan dialect (Darija). The model was trained on the very first NER dataset in Darija, DarNERcorp, that can be found on Mendeley https://data.mendeley.com/datasets/286sss4k9v/4.
Model Description
- Developed by: Hanane Nour Moussa
- Model type: Token classification
- Language(s) (NLP): Arabic, Darija
Model Sources
- Repository: https://github.com/HananeNourMoussa/darija-ner
- Paper (dataset): Hanane Nour Moussa, Asmaa Mourhir, DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect, Data in Brief
Metrics
F1 score.
Results
DarNERcorp_test: F1 = 66.06%
MixedNERcorp_test: F1 = 70.06%
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: NVIDIA T4
- Hours used: 0.7
- Cloud Provider: Google Cloud
- Compute Region: europe-west1
- Carbon Emitted: 0.01 kg
Citation
If you use DarNERcorp dataset to train your models, cite the following paper:
Hanane Nour Moussa, Asmaa Mourhir, DarNERcorp: An annotated named entity recognition dataset in the Moroccan dialect, Data in Brief, Volume 48, 2023, 109234, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2023.109234. (https://www.sciencedirect.com/science/article/pii/S2352340923003530)
GitHub Repo:
Our data curation and model traning code is openly available on GitHub: https://github.com/HananeNourMoussa/darija-ner
- Downloads last month
- 17