--- license: mit language: - en - es - it - pt - fr - zh - hi - ar - nl - ko pipeline_tag: fill-mask tags: - twitter --- # XLM-RoBERTA-large-twitter This is a XLM-RoBERTa-large model tuned on a corpus of over 156 million tweets in ten languages: English, Spanish, Italian, Portuguese, French, Chinese, Hindi, Arabic, Dutch and Korean. The model has been trained from the original XLM-RoBERTA-large checkpoint for 2 epochs with a batch size of 1024. For best results, preprocess the tweets using the following method before passing them to the model: ```python def preprocess(text): new_text = [] for t in text.split(" "): t = '@user' if t.startswith('@') and len(t) > 1 else t t = 'http' if t.startswith('http') else t new_text.append(t) return " ".join(new_text) ```