|
--- |
|
language: |
|
- es |
|
|
|
tags: |
|
- twitter |
|
- pos-tagging |
|
|
|
--- |
|
# POS Tagging model for Spanish/English |
|
## robertuito-pos |
|
|
|
Repository: [https://github.com/pysentimiento/pysentimiento/](https://github.com/finiteautomata/pysentimiento/) |
|
|
|
|
|
Model trained with the Spanish/English split of the [LinCE NER corpus](https://ritual.uh.edu/lince/), a code-switched benchmark . Base model is [RoBERTuito](https://github.com/pysentimiento/robertuito), a RoBERTa model trained in Spanish tweets. |
|
|
|
## Usage |
|
|
|
If you want to use this model, we suggest you use it directly from the `pysentimiento` library as it is not working properly with the pipeline due to tokenization issues |
|
|
|
```python |
|
from pysentimiento import create_analyzer |
|
|
|
pos_analyzer = create_analyzer("pos", lang="es") |
|
|
|
pos_analyzer.predict("Quiero que esto funcione correctamente! @perezjotaeme") |
|
|
|
|
|
>[{'type': 'PROPN', 'text': 'Quiero', 'start': 0, 'end': 6}, |
|
> {'type': 'SCONJ', 'text': 'que', 'start': 7, 'end': 10}, |
|
> {'type': 'PRON', 'text': 'esto', 'start': 11, 'end': 15}, |
|
> {'type': 'VERB', 'text': 'funcione', 'start': 16, 'end': 24}, |
|
> {'type': 'ADV', 'text': 'correctamente', 'start': 25, 'end': 38}, |
|
> {'type': 'PUNCT', 'text': '!', 'start': 38, 'end': 39}, |
|
> {'type': 'NOUN', 'text': '@perezjotaeme', 'start': 40, 'end': 53}] |
|
``` |
|
|
|
|
|
## Results |
|
|
|
Results are taken from the LinCE leaderboard |
|
|
|
| Model | Sentiment | NER | POS | |
|
|:-----------------------|:----------------|:-------------------|:--------| |
|
| RoBERTuito | **60.6** | 68.5 | 97.2 | |
|
| XLM Large | -- | **69.5** | **97.2** | |
|
| XLM Base | -- | 64.9 | 97.0 | |
|
| C2S mBERT | 59.1 | 64.6 | 96.9 | |
|
| mBERT | 56.4 | 64.0 | 97.1 | |
|
| BERT | 58.4 | 61.1 | 96.9 | |
|
| BETO | 56.5 | -- | -- | |
|
|
|
|
|
|
|
## Citation |
|
|
|
If you use this model in your research, please cite pysentimiento, RoBERTuito and LinCE papers: |
|
|
|
``` |
|
@misc{perez2021pysentimiento, |
|
title={pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks}, |
|
author={Juan Manuel Pérez and Juan Carlos Giudici and Franco Luque}, |
|
year={2021}, |
|
eprint={2106.09462}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
@inproceedings{ortega2019overview, |
|
title={Overview of the task on irony detection in Spanish variants}, |
|
author={Ortega-Bueno, Reynier and Rangel, Francisco and Hern{\'a}ndez Far{\i}as, D and Rosso, Paolo and Montes-y-G{\'o}mez, Manuel and Medina Pagola, Jos{\'e} E}, |
|
booktitle={Proceedings of the Iberian languages evaluation forum (IberLEF 2019), co-located with 34th conference of the Spanish Society for natural language processing (SEPLN 2019). CEUR-WS. org}, |
|
volume={2421}, |
|
pages={229--256}, |
|
year={2019} |
|
} |
|
|
|
@inproceedings{aguilar2020lince, |
|
title={LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation}, |
|
author={Aguilar, Gustavo and Kar, Sudipta and Solorio, Thamar}, |
|
booktitle={Proceedings of the 12th Language Resources and Evaluation Conference}, |
|
pages={1803--1813}, |
|
year={2020} |
|
} |
|
``` |