mBERTu
A Maltese multilingual model pre-trained on the Korpus Malti v4.0 using multilingual BERT as the initial checkpoint.
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.
Citation
This work was first presented in Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese. Cite it as follows:
@inproceedings{BERTu,
title = "Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and {BERT} Models for {M}altese",
author = "Micallef, Kurt and
Gatt, Albert and
Tanti, Marc and
van der Plas, Lonneke and
Borg, Claudia",
booktitle = "Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing",
month = jul,
year = "2022",
address = "Hybrid",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.deeplo-1.10",
doi = "10.18653/v1/2022.deeplo-1.10",
pages = "90--101",
}
- Downloads last month
- 27
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Dataset used to train MLRS/mBERTu
Evaluation results
- Unlabelled Attachment Score on Maltese Universal Dependencies Treebank (MUDT)self-reported92.100
- Labelled Attachment Score on Maltese Universal Dependencies Treebank (MUDT)self-reported87.870
- UPOS Accuracy on MLRS POS datasetself-reported98.660
- XPOS Accuracy on MLRS POS datasetself-reported98.580
- Span-based F1 on WikiAnn (Maltese)self-reported86.600
- Macro-averaged F1 on Maltese Sentiment Analysis Datasetself-reported76.790