|
--- |
|
language: |
|
- mt |
|
datasets: |
|
- MLRS/korpus_malti |
|
model-index: |
|
- name: BERTu |
|
results: |
|
- task: |
|
type: dependency-parsing |
|
name: Dependency Parsing |
|
dataset: |
|
type: universal_dependencies |
|
args: mt_mudt |
|
name: Maltese Universal Dependencies Treebank (MUDT) |
|
metrics: |
|
- type: uas |
|
value: 92.31 |
|
name: Unlabelled Attachment Score |
|
- type: las |
|
value: 88.14 |
|
name: Labelled Attachment Score |
|
- task: |
|
type: part-of-speech-tagging |
|
name: Part-of-Speech Tagging |
|
dataset: |
|
type: mlrs_pos |
|
name: MLRS POS dataset |
|
metrics: |
|
- type: accuracy |
|
value: 98.58 |
|
name: UPOS Accuracy |
|
args: upos |
|
- type: accuracy |
|
value: 98.54 |
|
name: XPOS Accuracy |
|
args: xpos |
|
- task: |
|
type: named-entity-recognition |
|
name: Named Entity Recognition |
|
dataset: |
|
type: wikiann |
|
name: WikiAnn (Maltese) |
|
args: mt |
|
metrics: |
|
- type: f1 |
|
args: span |
|
value: 86.77 |
|
name: Span-based F1 |
|
- task: |
|
type: sentiment-analysis |
|
name: Sentiment Analysis |
|
dataset: |
|
type: mt-sentiment-analysis |
|
name: Maltese Sentiment Analysis Dataset |
|
metrics: |
|
- type: f1 |
|
args: macro |
|
value: 78.96 |
|
name: Macro-averaged F1 |
|
license: cc-by-nc-sa-4.0 |
|
widget: |
|
- text: "Malta hija gżira fil-[MASK]." |
|
--- |
|
|
|
# BERTu |
|
|
|
A Maltese monolingual model pre-trained from scratch on the Korpus Malti v4.0 using the BERT (base) architecture. |
|
|
|
|
|
## License |
|
|
|
This work is licensed under a |
|
[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License][cc-by-nc-sa]. |
|
Permissions beyond the scope of this license may be available at [https://mlrs.research.um.edu.mt/](https://mlrs.research.um.edu.mt/). |
|
|
|
[![CC BY-NC-SA 4.0][cc-by-nc-sa-image]][cc-by-nc-sa] |
|
|
|
[cc-by-nc-sa]: http://creativecommons.org/licenses/by-nc-sa/4.0/ |
|
[cc-by-nc-sa-image]: https://licensebuttons.net/l/by-nc-sa/4.0/88x31.png |
|
|
|
## Citation |
|
|
|
This work was first presented in [Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese](https://aclanthology.org/2022.deeplo-1.10/). |
|
Cite it as follows: |
|
|
|
```bibtex |
|
@inproceedings{BERTu, |
|
title = "Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and {BERT} Models for {M}altese", |
|
author = "Micallef, Kurt and |
|
Gatt, Albert and |
|
Tanti, Marc and |
|
van der Plas, Lonneke and |
|
Borg, Claudia", |
|
booktitle = "Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing", |
|
month = jul, |
|
year = "2022", |
|
address = "Hybrid", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://aclanthology.org/2022.deeplo-1.10", |
|
doi = "10.18653/v1/2022.deeplo-1.10", |
|
pages = "90--101", |
|
} |
|
``` |
|
|