File size: 1,814 Bytes
2da54d2 fe7e488 2da54d2 fdb0bd6 2da54d2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
---
license: apache-2.0
language:
- pl
---
# Model Card for xlm-roberta-large-lemma-pl
This model is a fine-tuned version of [xlm-roberta-large](https://huggingface.co/FacebookAI/xlm-roberta-large) for the contextual lemmatization task.
The datasets used for training are extracted from the data of the [SIGMORPHON 2019 Shared Task](https://aclanthology.org/W19-4211/).
The model for the Polish language was trained using [LFG corpus]().
# Training Hyperparameters
SEED: 42
EPOCHS: 20
BATCH SIZE: 4
GRADIENT ACCUMULATION STEPS: 2
LEARNING RATE: 0.00002
WARMUP: 0.06
WEIGHT DECAY: 0.1
# Results
For more details you can see the paper and the repository:
- 📖 Paper: [On the Role of Morphological Information for Contextual Lemmatization](https://direct.mit.edu/coli/article/50/1/157/118134/On-the-Role-of-Morphological-Information-for)
- 🌐 Repository: [Datasets and training files](https://github.com/hitz-zentroa/ses-lemma)
**Contact**: [Olia Toporkov](https://www.ixa.eus/node/13292) and [Rodrigo Agerri](https://ragerri.github.io/) HiTZ Center - Ixa, University of the Basque Country UPV/EHU
**Funding**:
**Model type**: xlm-roberta-large
**Language(s) (NLP)**: Polish
**License**: apache-2.0
# Citation
```bibtext
@article{10.1162/coli_a_00497,
author = {Toporkov, Olia and Agerri, Rodrigo},
title = "{On the Role of Morphological Information for Contextual
Lemmatization}",
journal = {Computational Linguistics},
volume = {50},
number = {1},
pages = {157-191},
year = {2024},
month = {03},
issn = {0891-2017},
doi = {10.1162/coli_a_00497},
url = {https://doi.org/10.1162/coli\_a\_00497},
eprint = {https://direct.mit.edu/coli/article-pdf/50/1/157/2367156/coli\_a\_00497.pdf},
}
```
|