rabindralamsal commited on
Commit
1d8f63b
1 Parent(s): 385b8eb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Sentiment Analysis of English Tweets
2
+
3
+ **BERTsent**: A finetuned **BERT** based **sent**iment classifier for English language tweets.
4
+
5
+ BERTsent is trained with SemEval 2017 corpus (39k plus tweets) and is based on [bertweet-base](https://github.com/VinAIResearch/BERTweet) that was trained on 850M English Tweets (cased) and additional 23M COVID-19 English Tweets (cased). The base model used [RoBERTa](https://github.com/pytorch/fairseq/blob/master/examples/roberta/README.md) pre-training procedure.
6
+
7
+ Output labels:
8
+
9
+ - 0 represents "negative" sentiment
10
+ - 1 represents "neutral" sentiment
11
+ - 2 represents "positive" sentiment
12
+
13
+ ## Using the model
14
+
15
+ Install transformers, if already not installed:
16
+
17
+ terminal: pip install transformers
18
+ notebooks (Colab, Kaggle): !pip install transformers
19
+
20
+ Import BERTsent from the transformers library:
21
+
22
+ from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
23
+
24
+ tokenizer = AutoTokenizer.from_pretrained("rabindralamsal/finetuned-bertweet-sentiment-analysis")
25
+
26
+ model = TFAutoModelForSequenceClassification.from_pretrained("rabindralamsal/finetuned-bertweet-sentiment-analysis")
27
+
28
+ Import TensorFlow:
29
+
30
+ import tensorflow as tf
31
+
32
+ We have installed and imported everything that's needed for the sentiment analysis. Let's predict sentiment of an example tweet:
33
+
34
+ example_tweet = "The NEET exams show our Govt in a poor light: unresponsiveness to genuine concerns; admit cards not delivered to aspirants in time; failure to provide centres in towns they reside, thus requiring unnecessary & risky travels. What a disgrace to treat our #Covid warriors like this!"
35
+ #this tweet resides on Twitter with an identifier-1435793872588738560
36
+
37
+ input = tokenizer.encode(example_tweet, return_tensors="tf")
38
+ output = model.predict(input)[0]
39
+ prediction = tf.nn.softmax(output, axis=1).numpy()
40
+ sentiment = np.argmax(prediction)
41
+
42
+ print(prediction)
43
+ print(sentiment)
44
+
45
+ Output:
46
+
47
+ [[0.9862386 0.01050556 0.00325586]]
48
+ 0