Papers
arxiv:2104.12250

XLM-T: Multilingual Language Models in Twitter for Sentiment Analysis and Beyond

Published on Apr 25, 2021
Authors:
,
,

Abstract

Language models are ubiquitous in current NLP, and their multilingual capacity has recently attracted considerable attention. However, current analyses have almost exclusively focused on (multilingual variants of) standard benchmarks, and have relied on clean pre-training and task-specific corpora as multilingual signals. In this paper, we introduce XLM-T, a model to train and evaluate multilingual language models in Twitter. In this paper we provide: (1) a new strong multilingual baseline consisting of an XLM-R (Conneau et al. 2020) model pre-trained on millions of tweets in over thirty languages, alongside starter code to subsequently fine-tune on a target task; and (2) a set of unified sentiment analysis Twitter datasets in eight different languages and a XLM-T model fine-tuned on them.

Community

Sign up or log in to comment

Models citing this paper 14

Browse 14 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2104.12250 in a dataset README.md to link it from this page.

Spaces citing this paper 28

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.