File size: 3,230 Bytes
b3aa083 924d16d b3aa083 924d16d b3aa083 ae3507a b3aa083 cedf948 b3aa083 924d16d b3aa083 d3ec476 924d16d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
from classifier import classify
from PIL import Image
import streamlit as st
st.title("Twitter Sentiment Analysis using BERT model")
st.subheader("Motivation")
st.markdown("""
Social media has significantly shortened the digital world making it easy for fake news to spread like wildfire.
According to official reports, 36.7 percent [6] of the total population have felt that they are being cyberbullied in their lifetime.
Since the level of offensiveness is subjective, conventional sentiment analysis might not do a perfect job in classifying them.
A way to get around this is to use significantly large and diverse Deep Learning datasets that can generalize the model.
Huggingface spaces provides an easy interfce to test the models before the use. Also, share the models with ease.
""")
st.subheader("Play with the model")
text = st.text_input("Enter a tweet to classify it as either Normal or Abusive. (Press enter to submit)",
value="I love DCNM course", max_chars=512, key=None, type="default",
help=None, autocomplete=None)
st.markdown(f"The tweet is classified as: **{classify(text)}**")
st.markdown("Try out for abusive _Avatar is a crappy movie_")
st.subheader("About the model")
st.markdown("""
Model was trained on twitter dataset ENCASEH2020 from Founta, A.M et. al. (2018) [3]. BERT Tiny model [1][2][5] was chosen for this project because, empirically,
giving better result with least number of parameters. The model was trained for 10 epochs with batch size of 32 and AdamW optimizer with learning rate of 1e-2 and loss as cross entropy.
""")
st.image("./images/train_val_accuracy.png", caption="Train and validation Accuracy - On an average we are getting 96 percent accuracy", use_column_width=True)
st.image("./images/train_test_scores.png", caption="Classification Report - We are getting F1 score of 0.96 for both the classes", use_column_width=True)
st.image("./images/confusion_matrix.png", caption="Confusion Matrix - Only 217 datapoints are mis-classified from 5430 data points in the test dataset", use_column_width=True)
st.subheader("References")
st.markdown("1. [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)")
st.markdown("2. [BERT-Tiny: A Tiny BERT for Natural Language Understanding](https://arxiv.org/abs/1909.10351)")
st.markdown("3. [Founta, A.M., Djouvas, C., Chatzakou, D., Leontiadis, I., Blackburn, J., Stringhini, G., Vakali, A., Sirivianos, M., & Kourtellis, N. (2018).Large Scale Crowdsourcing and Characterization of Twitter Abusive Behavior. In 11th International Conference on Web and Social Media, ICWSM 2018.](https://arxiv.org/abs/1802.00393)")
st.markdown("4. [Nandagopan D, Kowsik & Dinesh, Navaneeth & S Ram, Ajay. & C N, Amarnath. (2022). End-to-End Messaging System Enhancement using Federated Learning for Cyberbullying Detection. 10.13140/RG.2.2.35686.70722. ](https://github.com/Cubemet/bert-models)")
st.markdown("5. [Base Model from nreimers](https://huggingface.co/nreimers/BERT-Tiny_L-2_H-128_A-2)")
st.markdown("6. [IHPL, Cyberbullying, a Growing Public Health Concern (Aug 2018)](https://ihpl.llu.edu/blog/cyberbullying-growing-public-health-concern)") |