File size: 3,037 Bytes
45db32a 1d8f63b 272a78a 9514b13 272a78a 06bc749 1d8f63b 9514b13 3c4d492 9514b13 1d8f63b e7fffa7 1d8f63b 3c4d492 1d8f63b 3c4d492 1d8f63b 2f80cb6 1d8f63b 2f80cb6 1d8f63b ce07771 9514b13 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
# Sentiment Analysis of English Tweets with BERTsent
**BERTsent**: A finetuned **BERT** based **sent**iment classifier for English language tweets.
BERTsent is trained with SemEval 2017 corpus (39k plus tweets) and is based on [bertweet-base](https://github.com/VinAIResearch/BERTweet) that was trained on 850M English Tweets (cased) and additional 23M COVID-19 English Tweets (cased). The base model used [RoBERTa](https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.md) pre-training procedure.
Output labels:
- 0 represents "negative" sentiment
- 1 represents "neutral" sentiment
- 2 represents "positive" sentiment
## COVID-19 tweets specific task
Eg.,
The model distinguishes: "covid" -> neutral sentiment, "I have covid" -> negative sentiment
## Cite
If you use BERTsent in your project/research, please cite the following article:
Lamsal, R., Harwood, A., & Read, M. R. (2022). [Twitter conversations predict the daily confirmed COVID-19 cases](https://arxiv.org/abs/2206.10471). arXiv preprint arXiv:2206.10471.
@article{lamsal2022twitter,
title={Twitter conversations predict the daily confirmed COVID-19 cases},
author={Lamsal, Rabindra and Harwood, Aaron and Read, Maria Rodriguez},
journal={Applied Soft Computing},
volume={129},
pages={109603},
year={2022},
publisher={Elsevier}
}
## Using the model
Install transformers and emoji, if already not installed:
terminal:
pip install transformers
pip install emoji (for converting emoticons or emojis into text)
notebooks (Colab, Kaggle):
!pip install transformers
!pip install emoji
Import BERTsent from the transformers library:
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("rabindralamsal/BERTsent")
model = TFAutoModelForSequenceClassification.from_pretrained("rabindralamsal/BERTsent")
Import TensorFlow and numpy:
import tensorflow as tf
import numpy as np
We have installed and imported everything that's needed for the sentiment analysis. Let's predict sentiment of an example tweet:
example_tweet = "The NEET exams show our Govt in a poor light: unresponsiveness to genuine concerns; admit cards not delivered to aspirants in time; failure to provide centres in towns they reside, thus requiring unnecessary & risky travels. What a disgrace to treat our #Covid warriors like this!"
#this tweet resides on Twitter with an identifier-1435793872588738560
input = tokenizer.encode(example_tweet, return_tensors="tf")
output = model.predict(input)[0]
prediction = tf.nn.softmax(output, axis=1).numpy()
sentiment = np.argmax(prediction)
print(prediction)
print(sentiment)
Output:
[[0.972672164440155 0.023684727028012276 0.003643065458163619]]
0 |