--- license: cc-by-4.0 datasets: - masakhane/masakhaner2 - dsfsi/PuoData language: - tn metrics: - f1 library_name: transformers pipeline_tag: token-classification tags: - ner - african nlp - africanlp --- # PuoBERTa-NER: A Setswana Langage Model Finetuned on MasakhaNER-2 for Named Entity Recognition. [![Zenodo doi badge](https://img.shields.io/badge/DOI-10.5281%2Fzenodo.8434795-blue.svg)](https://doi.org/10.5281/zenodo.8434795) [![arXiv](https://img.shields.io/badge/arXiv-2310.09141-b31b1b.svg)](https://arxiv.org/abs/2310.09141) 🤗 [https://huggingface.co/dsfsi/PuoBERTa](https://huggingface.co/dsfsi/PuoBERTa) A Roberta-based language model finetuned on MasakhaneNER-2 for Named Entity Recognition. Based on [https://huggingface.co/dsfsi/PuoBERTa](https://huggingface.co/dsfsi/PuoBERTa) ## Model Details ### Model Description This is a POS model trained on Setswana based on PuoBERTa and fineruned on MasakhaNER-2 Setswana. - **Developed by:** Vukosi Marivate ([@vukosi](https://huggingface.co/@vukosi)), Moseli Mots'Oehli ([@MoseliMotsoehli](https://huggingface.co/@MoseliMotsoehli)) , Valencia Wagner, Richard Lastrucci and Isheanesu Dzingirai - **Model type:** RoBERTa Model - **Language(s) (NLP):** Setswana - **License:** CC BY 4.0 ### Model Performance Performance of models on the [MasakhaNER-2](https://github.com/masakhane-io/masakhane-ner/tree/main/MasakhaNER2.0) downstream task. | Model | Test Performance (f1 score) | |---|---| | **Multilingual Models** | | | AfriBERTa | 83.2 | | AfroXLMR-base | 87.7 | | AfroXLMR-large | 89.4 | | **Monolingual Models** | | | NCHLT TSN RoBERTa | 74.2 | | PuoBERTa | **78.2** | | PuoBERTa+JW300 | **80.2** | ### Usage Use this model for Part of Speech Tagging for Setswana. ```python ``` ## Citation Information Bibtex Refrence ``` @inproceedings{marivate2023puoberta, title = {PuoBERTa: Training and evaluation of a curated language model for Setswana}, author = {Vukosi Marivate and Moseli Mots'Oehli and Valencia Wagner and Richard Lastrucci and Isheanesu Dzingirai}, year = {2023}, booktitle= {Artificial Intelligence Research. SACAIR 2023. Communications in Computer and Information Science}, url= {https://link.springer.com/chapter/10.1007/978-3-031-49002-6_17}, keywords = {NLP}, preprint_url = {https://arxiv.org/abs/2310.09141}, dataset_url = {https://github.com/dsfsi/PuoBERTa}, software_url = {https://huggingface.co/dsfsi/PuoBERTa} } ``` ## Contributing Your contributions are welcome! Feel free to improve the model. ## Model Card Authors Vukosi Marivate ## Model Card Contact For more details, reach out or check our [website](https://dsfsi.github.io/). Email: vukosi.marivate@cs.up.ac.za **Enjoy exploring Setswana through AI!**