File size: 3,286 Bytes
d383d61 d75b581 d383d61 c65b4a1 d383d61 b8dcf50 d383d61 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
---
language:
- es
tags:
- twitter
- pos-tagging
---
# POS Tagging model for Spanish/English
## robertuito-pos
Repository: [https://github.com/pysentimiento/pysentimiento/](https://github.com/finiteautomata/pysentimiento/)
Model trained with the Spanish/English split of the [LinCE NER corpus](https://ritual.uh.edu/lince/), a code-switched benchmark . Base model is [RoBERTuito](https://github.com/pysentimiento/robertuito), a RoBERTa model trained in Spanish tweets.
## Usage
If you want to use this model, we suggest you use it directly from the `pysentimiento` library as it is not working properly with the pipeline due to tokenization issues
```python
from pysentimiento import create_analyzer
pos_analyzer = create_analyzer("pos", lang="es")
pos_analyzer.predict("Quiero que esto funcione correctamente! @perezjotaeme")
>[{'type': 'PROPN', 'text': 'Quiero', 'start': 0, 'end': 6},
> {'type': 'SCONJ', 'text': 'que', 'start': 7, 'end': 10},
> {'type': 'PRON', 'text': 'esto', 'start': 11, 'end': 15},
> {'type': 'VERB', 'text': 'funcione', 'start': 16, 'end': 24},
> {'type': 'ADV', 'text': 'correctamente', 'start': 25, 'end': 38},
> {'type': 'PUNCT', 'text': '!', 'start': 38, 'end': 39},
> {'type': 'NOUN', 'text': '@perezjotaeme', 'start': 40, 'end': 53}]
```
## Results
Results are taken from the LinCE leaderboard
| Model | Sentiment | NER | POS |
|:-----------------------|:----------------|:-------------------|:--------|
| RoBERTuito | **60.6** | 68.5 | 97.2 |
| XLM Large | -- | **69.5** | **97.2** |
| XLM Base | -- | 64.9 | 97.0 |
| C2S mBERT | 59.1 | 64.6 | 96.9 |
| mBERT | 56.4 | 64.0 | 97.1 |
| BERT | 58.4 | 61.1 | 96.9 |
| BETO | 56.5 | -- | -- |
## Citation
If you use this model in your research, please cite pysentimiento, RoBERTuito and LinCE papers:
```
@misc{perez2021pysentimiento,
title={pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks},
author={Juan Manuel Pérez and Juan Carlos Giudici and Franco Luque},
year={2021},
eprint={2106.09462},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@inproceedings{ortega2019overview,
title={Overview of the task on irony detection in Spanish variants},
author={Ortega-Bueno, Reynier and Rangel, Francisco and Hern{\'a}ndez Far{\i}as, D and Rosso, Paolo and Montes-y-G{\'o}mez, Manuel and Medina Pagola, Jos{\'e} E},
booktitle={Proceedings of the Iberian languages evaluation forum (IberLEF 2019), co-located with 34th conference of the Spanish Society for natural language processing (SEPLN 2019). CEUR-WS. org},
volume={2421},
pages={229--256},
year={2019}
}
@inproceedings{aguilar2020lince,
title={LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation},
author={Aguilar, Gustavo and Kar, Sudipta and Solorio, Thamar},
booktitle={Proceedings of the 12th Language Resources and Evaluation Conference},
pages={1803--1813},
year={2020}
}
``` |