Instructions to use cardiffnlp/twitter-roberta-base-sentiment with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use cardiffnlp/twitter-roberta-base-sentiment with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="cardiffnlp/twitter-roberta-base-sentiment")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("cardiffnlp/twitter-roberta-base-sentiment") model = AutoModelForSequenceClassification.from_pretrained("cardiffnlp/twitter-roberta-base-sentiment") - Inference
- Notebooks
- Google Colab
- Kaggle
| datasets: | |
| - tweet_eval | |
| language: | |
| - en | |
| # Twitter-roBERTa-base for Sentiment Analysis | |
| This is a roBERTa-base model trained on ~58M tweets and finetuned for sentiment analysis with the TweetEval benchmark. This model is suitable for English (for a similar multilingual model, see [XLM-T](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment)). | |
| - Reference Paper: [_TweetEval_ (Findings of EMNLP 2020)](https://arxiv.org/pdf/2010.12421.pdf). | |
| - Git Repo: [Tweeteval official repository](https://github.com/cardiffnlp/tweeteval). | |
| <b>Labels</b>: | |
| 0 -> Negative; | |
| 1 -> Neutral; | |
| 2 -> Positive | |
| <b>New!</b> We just released a new sentiment analysis model trained on more recent and a larger quantity of tweets. | |
| See [twitter-roberta-base-sentiment-latest](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment-latest) and [TweetNLP](https://tweetnlp.org) for more details. | |
| ## Example of classification | |
| ```python | |
| from transformers import AutoModelForSequenceClassification | |
| from transformers import TFAutoModelForSequenceClassification | |
| from transformers import AutoTokenizer | |
| import numpy as np | |
| from scipy.special import softmax | |
| import csv | |
| import urllib.request | |
| # Preprocess text (username and link placeholders) | |
| def preprocess(text): | |
| new_text = [] | |
| for t in text.split(" "): | |
| t = '@user' if t.startswith('@') and len(t) > 1 else t | |
| t = 'http' if t.startswith('http') else t | |
| new_text.append(t) | |
| return " ".join(new_text) | |
| # Tasks: | |
| # emoji, emotion, hate, irony, offensive, sentiment | |
| # stance/abortion, stance/atheism, stance/climate, stance/feminist, stance/hillary | |
| task='sentiment' | |
| MODEL = f"cardiffnlp/twitter-roberta-base-{task}" | |
| tokenizer = AutoTokenizer.from_pretrained(MODEL) | |
| # download label mapping | |
| labels=[] | |
| mapping_link = f"https://raw.githubusercontent.com/cardiffnlp/tweeteval/main/datasets/{task}/mapping.txt" | |
| with urllib.request.urlopen(mapping_link) as f: | |
| html = f.read().decode('utf-8').split("\n") | |
| csvreader = csv.reader(html, delimiter='\t') | |
| labels = [row[1] for row in csvreader if len(row) > 1] | |
| # PT | |
| model = AutoModelForSequenceClassification.from_pretrained(MODEL) | |
| model.save_pretrained(MODEL) | |
| text = "Good night ๐" | |
| text = preprocess(text) | |
| encoded_input = tokenizer(text, return_tensors='pt') | |
| output = model(**encoded_input) | |
| scores = output[0][0].detach().numpy() | |
| scores = softmax(scores) | |
| # # TF | |
| # model = TFAutoModelForSequenceClassification.from_pretrained(MODEL) | |
| # model.save_pretrained(MODEL) | |
| # text = "Good night ๐" | |
| # encoded_input = tokenizer(text, return_tensors='tf') | |
| # output = model(encoded_input) | |
| # scores = output[0][0].numpy() | |
| # scores = softmax(scores) | |
| ranking = np.argsort(scores) | |
| ranking = ranking[::-1] | |
| for i in range(scores.shape[0]): | |
| l = labels[ranking[i]] | |
| s = scores[ranking[i]] | |
| print(f"{i+1}) {l} {np.round(float(s), 4)}") | |
| ``` | |
| Output: | |
| ``` | |
| 1) positive 0.8466 | |
| 2) neutral 0.1458 | |
| 3) negative 0.0076 | |
| ``` | |
| ### BibTeX entry and citation info | |
| Please cite the [reference paper](https://aclanthology.org/2020.findings-emnlp.148/) if you use this model. | |
| ```bibtex | |
| @inproceedings{barbieri-etal-2020-tweeteval, | |
| title = "{T}weet{E}val: Unified Benchmark and Comparative Evaluation for Tweet Classification", | |
| author = "Barbieri, Francesco and | |
| Camacho-Collados, Jose and | |
| Espinosa Anke, Luis and | |
| Neves, Leonardo", | |
| booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020", | |
| month = nov, | |
| year = "2020", | |
| address = "Online", | |
| publisher = "Association for Computational Linguistics", | |
| url = "https://aclanthology.org/2020.findings-emnlp.148", | |
| doi = "10.18653/v1/2020.findings-emnlp.148", | |
| pages = "1644--1650" | |
| } | |
| ``` |