--- license: afl-3.0 datasets: - tweet_eval - sentiment140 - mteb/tweet_sentiment_extraction - yelp_review_full - amazon_polarity language: - en metrics: - accuracy - sparse_val accuracy - sparse_val categorical accuracy library_name: transformers pipeline_tag: text-classification tags: - textclassisification - roberta - robertabase - sentimentanalysis - nlp - tweetanalysis - tweet - analysis - sentiment - positive - newsanalysis --- --- BYRD'S I - ROBERTA BASED TWEET/REVIEW/TEXT ANALYSIS --- This is roBERTa-base model fine tuned on 8 datasets with ~20 M tweets this model is suitable for english while can do a fine job on other languages. Git Repo: SENTIMENTANALYSIS-PROJECT Demo: BYRD'S I labels: 0 -> Negative; 1 -> Neutral; 2 -> Positive; Model Metrics
Accuracy: ~96%
Sparse Categorical Accuracy: 0.9597
Loss: 0.1144
val_loss -- [onLast_train] : 0.1482
Note: Due to dataset discrepencies of Neutral data we published another model Byrd's I only positive_negative model to find only neutral data and have used AdaBoot method to get the accurate output. # Example of Classification: ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoModelForSeq2SeqLM from transformers import TFAutoModelForSequenceClassification import pandas as pd import numpy as np import tensorflow # model 0 tokenizer = AutoTokenizer.from_pretrained("AK776161/birdseye_roberta-base-18", use_fast = True) model = AutoModelForSequenceClassification.from_pretrained("AK776161/birdseye_roberta-base-18", from_tf=True) # model1 tokenizer1 = AutoTokenizer.from_pretrained("AK776161/birdseye_roberta-base-tweet-eval", use_fast = True) model1 = AutoModelForSequenceClassification.from_pretrained("AK776161/birdseye_roberta-base-tweet-eval",from_tf=True) #-----------------------Adaboot technique--------------------------- def nparraymeancalc(arr1, arr2): returner = [] for i in range(0,len(arr1)): if(arr1[i][1] < -7): arr1[i][1] = 0 returner.append(np.mean([arr1[i],arr2[i]], axis = 0)) return np.array(returner) def predictions(tokenizedtext): output1 = model(**tokenizedtext) output2 = model1(**tokenizedtext) logits1 = output1.logits logits1 = logits1.detach().numpy() logits2 = output2.logits logits2 = logits2.detach().numpy() # print(logits1, logits2) predictionresult = nparraymeancalc(logits1,logits2) return np.array(predictionresult) def labelassign(predictionresult): labels = [] for i in predictionresult: label_id = i.argmax() labels.append(label_id) return labels tokenizeddata = tokenizer("----YOUR_TEXT---", return_tensors = 'pt', padding = True, truncation = True) result = predictions(tokenizeddata) print(labelassign(result)) ``` Output for "I LOVE YOU": ``` 1) Positive: 0.994 2) Negative: 0.000 3) Neutral: 0.006 ```