--- language: - es tags: - twitter - pos-tagging --- # POS Tagging model for Spanish/English ## robertuito-pos Repository: [https://github.com/pysentimiento/pysentimiento/](https://github.com/finiteautomata/pysentimiento/) Model trained with the Spanish/English split of the [LinCE NER corpus](https://ritual.uh.edu/lince/), a code-switched benchmark . Base model is [RoBERTuito](https://github.com/pysentimiento/robertuito), a RoBERTa model trained in Spanish tweets. ## Usage If you want to use this model, we suggest you use it directly from the `pysentimiento` library as it is not working properly with the pipeline due to tokenization issues ```python from pysentimiento import create_analyzer pos_analyzer = create_analyzer("pos", lang="es") pos_analyzer.predict("Quiero que esto funcione correctamente! @perezjotaeme") >[{'type': 'PROPN', 'text': 'Quiero', 'start': 0, 'end': 6}, > {'type': 'SCONJ', 'text': 'que', 'start': 7, 'end': 10}, > {'type': 'PRON', 'text': 'esto', 'start': 11, 'end': 15}, > {'type': 'VERB', 'text': 'funcione', 'start': 16, 'end': 24}, > {'type': 'ADV', 'text': 'correctamente', 'start': 25, 'end': 38}, > {'type': 'PUNCT', 'text': '!', 'start': 38, 'end': 39}, > {'type': 'NOUN', 'text': '@perezjotaeme', 'start': 40, 'end': 53}] ``` ## Results Results are taken from the LinCE leaderboard | Model | Sentiment | NER | POS | |:-----------------------|:----------------|:-------------------|:--------| | RoBERTuito | **60.6** | 68.5 | 97.2 | | XLM Large | -- | **69.5** | **97.2** | | XLM Base | -- | 64.9 | 97.0 | | C2S mBERT | 59.1 | 64.6 | 96.9 | | mBERT | 56.4 | 64.0 | 97.1 | | BERT | 58.4 | 61.1 | 96.9 | | BETO | 56.5 | -- | -- | ## Citation If you use this model in your research, please cite pysentimiento, RoBERTuito and LinCE papers: ``` @misc{perez2021pysentimiento, title={pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks}, author={Juan Manuel PĂ©rez and Juan Carlos Giudici and Franco Luque}, year={2021}, eprint={2106.09462}, archivePrefix={arXiv}, primaryClass={cs.CL} } @inproceedings{ortega2019overview, title={Overview of the task on irony detection in Spanish variants}, author={Ortega-Bueno, Reynier and Rangel, Francisco and Hern{\'a}ndez Far{\i}as, D and Rosso, Paolo and Montes-y-G{\'o}mez, Manuel and Medina Pagola, Jos{\'e} E}, booktitle={Proceedings of the Iberian languages evaluation forum (IberLEF 2019), co-located with 34th conference of the Spanish Society for natural language processing (SEPLN 2019). CEUR-WS. org}, volume={2421}, pages={229--256}, year={2019} } @inproceedings{aguilar2020lince, title={LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation}, author={Aguilar, Gustavo and Kar, Sudipta and Solorio, Thamar}, booktitle={Proceedings of the 12th Language Resources and Evaluation Conference}, pages={1803--1813}, year={2020} } ```