File size: 8,656 Bytes
aac1fea 0558012 aac1fea f6b867e aac1fea 9d99f0e aac1fea 9d99f0e aac1fea |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
---
license: apache-2.0
datasets:
- mbruton/galician_srl
- CoNLL-2012
language:
- gl
- en
metrics:
- seqeval
library_name: transformers
pipeline_tag: token-classification
---
# Model Card for GalBERT-en for Semantic Role Labeling (cased)
This model is fine-tuned on a version of [multilingual BERT](https://huggingface.co/bert-base-multilingual-cased) which is pre-trained on the SRL task for English, and is one of 24 models introduced as part of [this project](https://github.com/mbruton0426/GalicianSRL). Prior to this work, there were no published Galician datasets or models for SRL.
## Model Details
### Model Description
GalBERT-en for Semantic Role Labeling (SRL) is a transformers model, leveraging mBERT's extensive pretraining on 104 languages to achieve better SRL predictions for low-resource Galician. This model is additionally pre-trained on the SRL task for English. This model is cased: it makes a difference between english and English. It was fine-tuned on Galician with the following objectives:
- Identify up to 13 verbal roots within a sentence.
- Identify available arguments for each verbal root. Due to scarcity of data, this model focused solely on the identification of arguments 0, 1, and 2.
Labels are formatted as: r#:tag, where r# links the token to a specific verbal root of index #, and tag identifies the token as the verbal root (root) or an individual argument (arg0/arg1/arg2)
- **Developed by:** [Micaella Bruton](mailto:micaellabruton@gmail.com)
- **Model type:** Transformers
- **Language(s) (NLP):** Galician (gl), English (en)
- **License:** Apache 2.0
- **Finetuned from model:** [English pre-trained multilingual BERT](https://huggingface.co/liaad/srl-en_mbert-base)
### Model Sources
- **Repository:** [GalicianSRL](https://github.com/mbruton0426/GalicianSRL)
- **Paper:** To be updated
## Uses
This model is intended to be used to develop and improve natural language processing tools for Galician.
## Bias, Risks, and Limitations
Galician is a low-resource language which prior to this project lacked a semantic role labeling dataset. As such, the dataset used to train this model is extrememly limited and could benefit from the inclusion of additional sentences and manual validation by native speakers.
## Training Details
### Training Data
This model was pre-trained on the [OntoNotes 5.0 English SRL corpus](http://catalog.ldc.upenn.edu/LDC2013T19).
This model was fine-tuned on the "train" portion of the [GalicianSRL Dataset](https://huggingface.co/datasets/mbruton/galician_srl) produced as part of this same project.
#### Training Hyperparameters
- **Learning Rate:** 2e-5
- **Batch Size:** 16
- **Weight Decay:** 0.01
- **Early Stopping:** 10 epochs
## Evaluation
#### Testing Data
This model was tested on the "test" portion of the [GalicianSRL Dataset](https://huggingface.co/datasets/mbruton/galician_srl) produced as part of this same project.
#### Metrics
[seqeval](https://huggingface.co/spaces/evaluate-metric/seqeval) is a Python framework for sequence labeling evaluation. It can evaluate the performance of chunking tasks such as named-entity recognition, part-of-speech tagging, and semantic role labeling.
It supplies scoring both overall and per label type.
Overall:
- `accuracy`: the average [accuracy](https://huggingface.co/metrics/accuracy), on a scale between 0.0 and 1.0.
- `precision`: the average [precision](https://huggingface.co/metrics/precision), on a scale between 0.0 and 1.0.
- `recall`: the average [recall](https://huggingface.co/metrics/recall), on a scale between 0.0 and 1.0.
- `f1`: the average [F1 score](https://huggingface.co/metrics/f1), which is the harmonic mean of the precision and recall. It also has a scale of 0.0 to 1.0.
Per label type:
- `precision`: the average [precision](https://huggingface.co/metrics/precision), on a scale between 0.0 and 1.0.
- `recall`: the average [recall](https://huggingface.co/metrics/recall), on a scale between 0.0 and 1.0.
- `f1`: the average [F1 score](https://huggingface.co/metrics/f1), on a scale between 0.0 and 1.0.
### Results
| Label | Precision | Recall | f1-score | Support |
| :----------: | :-------: | :----: | :------: | :-----: |
| 0:arg0 | 0.74 | 0.76 | 0.75 | 485 |
| 0:arg1 | 0.76 | 0.72 | 0.74 | 483 |
| 0:arg2 | 0.70 | 0.74 | 0.72 | 264 |
| 0:root | 0.93 | 0.91 | 0.92 | 948 |
| 1:arg0 | 0.67 | 0.62 | 0.64 | 348 |
| 1:arg1 | 0.68 | 0.67 | 0.68 | 443 |
| 1:arg2 | 0.63 | 0.57 | 0.60 | 211 |
| 1:root | 0.86 | 0.84 | 0.85 | 802 |
| 2:arg0 | 0.57 | 0.52 | 0.54 | 240 |
| 2:arg1 | 0.64 | 0.56 | 0.60 | 331 |
| 2:arg2 | 0.50 | 0.54 | 0.52 | 156 |
| 2:root | 0.79 | 0.74 | 0.76 | 579 |
| 3:arg0 | 0.45 | 0.39 | 0.42 | 137 |
| 3:arg1 | 0.54 | 0.55 | 0.54 | 216 |
| 3:arg2 | 0.43 | 0.42 | 0.43 | 110 |
| 3:root | 0.63 | 0.71 | 0.67 | 374 |
| 4:arg0 | 0.48 | 0.46 | 0.47 | 70 |
| 4:arg1 | 0.50 | 0.57 | 0.53 | 109 |
| 4:arg2 | 0.44 | 0.61 | 0.51 | 66 |
| 4:root | 0.52 | 0.67 | 0.58 | 206 |
| 5:arg0 | 0.36 | 0.25 | 0.29 | 20 |
| 5:arg1 | 0.33 | 0.42 | 0.37 | 57 |
| 5:arg2 | 0.00 | 0.00 | 0.00 | 28 |
| 5:root | 0.57 | 0.31 | 0.41 | 102 |
| 6:arg0 | 0.00 | 0.00 | 0.00 | 13 |
| 6:arg1 | 0.00 | 0.00 | 0.00 | 25 |
| 6:arg2 | 0.00 | 0.00 | 0.00 | 8 |
| 6:root | 0.32 | 0.29 | 0.30 | 42 |
| 7:arg0 | 0.00 | 0.00 | 0.00 | 3 |
| 7:arg1 | 0.00 | 0.00 | 0.00 | 8 |
| 7:arg2 | 0.00 | 0.00 | 0.00 | 5 |
| 7:root | 0.00 | 0.00 | 0.00 | 16 |
| 8:arg0 | 0.00 | 0.00 | 0.00 | 1 |
| 8:arg1 | 0.00 | 0.00 | 0.00 | 2 |
| 8:arg2 | 0.00 | 0.00 | 0.00 | 1 |
| 8:root | 0.00 | 0.00 | 0.00 | 7 |
| 9:arg0 | 0.00 | 0.00 | 0.00 | 1 |
| 9:arg1 | 0.00 | 0.00 | 0.00 | 2 |
| 9:arg2 | 0.00 | 0.00 | 0.00 | 1 |
| 9:root | 0.00 | 0.00 | 0.00 | 3 |
| 10:arg1 | 0.00 | 0.00 | 0.00 | 1 |
| 10:root | 0.00 | 0.00 | 0.00 | 2 |
| micro avg | 0.70 | 0.68 | 0.69 | 6926 |
| macro avg | 0.33 | 0.33 | 0.33 | 6926 |
| weighted avg | 0.69 | 0.68 | 0.69 | 6926 |
| tot root avg | 0.42 | 0.41 | 0.41 | 3081 |
| tot A0 avg | 0.33 | 0.30 | 0.31 | 1318 |
| tot A1 avg | 0.31 | 0.32 | 0.31 | 1677 |
| tot A2 avg | 0.27 | 0.29 | 0.28 | 850 |
| tot r0 avg | 0.78 | 0.78 | 0.78 | 2180 |
| tot r1 avg | 0.71 | 0.68 | 0.69 | 1804 |
| tot r2 avg | 0.63 | 0.59 | 0.61 | 1306 |
| tot r3 avg | 0.51 | 0.52 | 0.52 | 837 |
| tot r4 avg | 0.49 | 0.58 | 0.52 | 451 |
| tot r5 avg | 0.32 | 0.25 | 0.27 | 207 |
| tot r6 avg | 0.08 | 0.07 | 0.08 | 88 |
| tot r7 avg | 0.00 | 0.00 | 0.00 | 32 |
| tot r8 avg | 0.00 | 0.00 | 0.00 | 11 |
| tot r9 avg | 0.00 | 0.00 | 0.00 | 7 |
| tot r10 avg | 0.00 | 0.00 | 0.00 | 3 |
## Citation
**BibTeX:**
```
@inproceedings{bruton-beloucif-2023-bertie,
title = "{BERT}ie Bott{'}s Every Flavor Labels: A Tasty Introduction to Semantic Role Labeling for {G}alician",
author = "Bruton, Micaella and
Beloucif, Meriem",
editor = "Bouamor, Houda and
Pino, Juan and
Bali, Kalika",
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.emnlp-main.671",
doi = "10.18653/v1/2023.emnlp-main.671",
pages = "10892--10902",
abstract = "In this paper, we leverage existing corpora, WordNet, and dependency parsing to build the first Galician dataset for training semantic role labeling systems in an effort to expand available NLP resources. Additionally, we introduce verb indexing, a new pre-processing method, which helps increase the performance when semantically parsing highly-complex sentences. We use transfer-learning to test both the resource and the verb indexing method. Our results show that the effects of verb indexing were amplified in scenarios where the model was both pre-trained and fine-tuned on datasets utilizing the method, but improvements are also noticeable when only used during fine-tuning. The best-performing Galician SRL model achieved an f1 score of 0.74, introducing a baseline for future Galician SRL systems. We also tested our method on Spanish where we achieved an f1 score of 0.83, outperforming the baseline set by the 2009 CoNLL Shared Task by 0.025 showing the merits of our verb indexing method for pre-processing.",
}
``` |