BERTu
A Maltese monolingual model pre-trained from scratch on the Korpus Malti v4.0 using the BERT (base) architecture.
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.
Citation
This work was first presented in Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese. Cite it as follows:
@inproceedings{BERTu,
title = "Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and {BERT} Models for {M}altese",
author = "Micallef, Kurt and
Gatt, Albert and
Tanti, Marc and
van der Plas, Lonneke and
Borg, Claudia",
booktitle = "Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing",
month = jul,
year = "2022",
address = "Hybrid",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.deeplo-1.10",
doi = "10.18653/v1/2022.deeplo-1.10",
pages = "90--101",
}
- Downloads last month
- 110
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Dataset used to train MLRS/BERTu
Evaluation results
- Unlabelled Attachment Score on Maltese Universal Dependencies Treebank (MUDT)self-reported92.310
- Labelled Attachment Score on Maltese Universal Dependencies Treebank (MUDT)self-reported88.140
- UPOS Accuracy on MLRS POS datasetself-reported98.580
- XPOS Accuracy on MLRS POS datasetself-reported98.540
- Span-based F1 on WikiAnn (Maltese)self-reported86.770
- Macro-averaged F1 on Maltese Sentiment Analysis Datasetself-reported78.960