# Twitter-roBERTa-base This is a roBERTa-base model trained on ~58M tweets, described and evaluated in the [_TweetEval_ benchmark (Findings of EMNLP 2020)](https://arxiv.org/pdf/2010.12421.pdf). To evaluate this and other LMs on Twitter-specific data, please refer to the [Tweeteval official repository](https://github.com/cardiffnlp/tweeteval). ## Example Masked Language Model ```python from transformers import pipeline, AutoTokenizer import numpy as np MODEL = "cardiffnlp/roberta-base-rt" fill_mask = pipeline("fill-mask", model=MODEL, tokenizer=MODEL) tokenizer = AutoTokenizer.from_pretrained(MODEL) def print_candidates(): for i in range(5): token = tokenizer.decode(candidates[i]['token']) score = np.round(candidates[i]['score'], 4) print(f"{i+1}) {token} {score}") texts = [ "I am so 😊", "I am so 😢" ] for text in texts: print(f"{'-'*30}\n{text}") candidates = fill_mask(text) print_candidates() ``` ``` ------------------------------ I am so 😊 1) happy 0.402 2) excited 0.1441 3) proud 0.143 4) grateful 0.0669 5) blessed 0.0334 ------------------------------ I am so 😢 1) sad 0.2641 2) sorry 0.1605 3) tired 0.138 4) sick 0.0278 5) hungry 0.0232 ```