|
--- |
|
license: apache-2.0 |
|
language: |
|
- eu |
|
--- |
|
|
|
|
|
# Model Card for xlm-roberta-large-lemma-eu |
|
|
|
This model is a fine-tuned version of [xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) for the contextual lemmatization task. |
|
The datasets used for training are extracted from the data of the [SIGMORPHON 2019 Shared Task](https://aclanthology.org/W19-4211/). |
|
The model for the Basque language was trained using [BDT corpus](). |
|
|
|
|
|
# Training Hyperparameters |
|
|
|
SEED: 42 |
|
EPOCHS: 20 |
|
BATCH SIZE: 8 |
|
GRADIENT ACCUMULATION STEPS: 2 |
|
LEARNING RATE: 0.00005 |
|
WARMUP: 0.06 |
|
WEIGHT DECAY: 0.01 |
|
|
|
# Results |
|
|
|
|
|
|
|
|
|
For more details you can see the paper and the repository: |
|
- π Paper: [On the Role of Morphological Information for Contextual Lemmatization](https://direct.mit.edu/coli/article/50/1/157/118134/On-the-Role-of-Morphological-Information-for) |
|
- π Repository: [Datasets and training files](https://github.com/hitz-zentroa/ses-lemma) |
|
|
|
|
|
**Contact**: [Olia Toporkov](https://www.ixa.eus/node/13292) and [Rodrigo Agerri](https://ragerri.github.io/) HiTZ Center - Ixa, University of the Basque Country UPV/EHU |
|
**Funding**: |
|
**Model type**: xlm-roberta-large |
|
**Language(s) (NLP)**: Basque |
|
**License**: apache-2.0 |
|
|
|
|
|
|
|
|
|
|
|
# Citation |
|
|
|
```bibtext |
|
@article{10.1162/coli_a_00497, |
|
author = {Toporkov, Olia and Agerri, Rodrigo}, |
|
title = "{On the Role of Morphological Information for Contextual |
|
Lemmatization}", |
|
journal = {Computational Linguistics}, |
|
volume = {50}, |
|
number = {1}, |
|
pages = {157-191}, |
|
year = {2024}, |
|
month = {03}, |
|
issn = {0891-2017}, |
|
doi = {10.1162/coli_a_00497}, |
|
url = {https://doi.org/10.1162/coli\_a\_00497}, |
|
eprint = {https://direct.mit.edu/coli/article-pdf/50/1/157/2367156/coli\_a\_00497.pdf}, |
|
} |
|
|
|
``` |