--- language: - es library_name: pysentimiento tags: - twitter - named-entity-recognition - ner datasets: - lince --- # Named Entity Recognition model for Spanish/English ## robertuito-ner Repository: [https://github.com/pysentimiento/pysentimiento/](https://github.com/finiteautomata/pysentimiento/) Model trained with the Spanish/English split of the [LinCE NER corpus](https://ritual.uh.edu/lince/), a code-switched benchmark . Base model is [RoBERTuito](https://github.com/pysentimiento/robertuito), a RoBERTa model trained in Spanish tweets. ## Usage If you want to use this model, we suggest you use it directly from the `pysentimiento` library as it is not working properly with the pipeline due to tokenization issues ```python from pysentimiento import create_analyzer ner_analyzer = create_analyzer("ner", lang="es") ner_analyzer.predict( "rindanse ante el mejor, leonel andres messi cuccitini. serresiete no existis, segui en al-nassr" ) # [{'type': 'PER', # 'text': 'leonel andres messi cuccitini', # 'start': 24, # 'end': 53}, # {'type': 'PER', 'text': 'serresiete', 'start': 55, 'end': 65}, # {'type': 'LOC', 'text': 'al-nassr', 'start': 108, 'end': 116}] ``` ## Results Results are taken from the LinCE leaderboard | Model | Sentiment | NER | POS | |:-----------------------|:----------------|:-------------------|:--------| | RoBERTuito | **60.6** | 68.5 | 97.2 | | XLM Large | -- | **69.5** | **97.2** | | XLM Base | -- | 64.9 | 97.0 | | C2S mBERT | 59.1 | 64.6 | 96.9 | | mBERT | 56.4 | 64.0 | 97.1 | | BERT | 58.4 | 61.1 | 96.9 | | BETO | 56.5 | -- | -- | ## Citation If you use this model in your research, please cite pysentimiento, RoBERTuito and LinCE papers: ``` @misc{perez2021pysentimiento, title={pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks}, author={Juan Manuel PĂ©rez and Juan Carlos Giudici and Franco Luque}, year={2021}, eprint={2106.09462}, archivePrefix={arXiv}, primaryClass={cs.CL} } @inproceedings{perez2022robertuito, title={RoBERTuito: a pre-trained language model for social media text in Spanish}, author={P{\'e}rez, Juan Manuel and Furman, Dami{\'a}n Ariel and Alemany, Laura Alonso and Luque, Franco M}, booktitle={Proceedings of the Thirteenth Language Resources and Evaluation Conference}, pages={7235--7243}, year={2022} } @inproceedings{aguilar2020lince, title={LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation}, author={Aguilar, Gustavo and Kar, Sudipta and Solorio, Thamar}, booktitle={Proceedings of the 12th Language Resources and Evaluation Conference}, pages={1803--1813}, year={2020} } ```