ans's picture
Update README.md
cdd12dd
|
raw
history blame
2.93 kB
metadata
language: en
license: apache-2.0
datasets:
  - tweets
widget:
  - text: COVID-19 vaccines are safe and effective.

Disclaimer: This page is under maintenance. Please DO NOT refer to the information on this page to make any decision yet.

Vaccinating COVID tweets

A fine-tuned model for fact-classification task on English tweets about COVID-19/vaccine.

Intended uses & limitations

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("ans/vaccinating-covid-tweets")
model = AutoModelForSequenceClassification.from_pretrained("ans/vaccinating-covid-tweets")

Limitations and bias

Provide examples of latent issues and potential remediations.

Training data & Procedure

Pre-trained baseline model

  • Pre-trained model: BERTweet
    • trained based on the RoBERTa pre-training procedure
    • 850M General English Tweets (Jan 2012 to Aug 2019)
    • 23M COVID-19 English Tweets
    • Size of the model: >134M parameters
  • Further training
    • Pre-training with recent COVID-19/vaccine tweets and fine-tuning for fact classification

1) Pre-training language model

  • The model was pre-trained on COVID-19/vaccined related tweets using a masked language modeling (MLM) objective starting from BERTweet
  • Following datasets on English tweets were used:
    • Tweets with trending #CovidVaccine hashtag, 207,000 tweets uploaded across Aug 2020 to Apr 2021 (kaggle)
    • Tweets about all COVID-19 vaccines, 78,000 tweets uploaded across Dec 2020 to May 2021 (kaggle)
    • COVID-19 Twitter chatter dataset, 590,000 tweets uploaded across Mar 2021 to May 2021 (github)

2) Fine-tuning for fact classification

  • A fine-tuned model on English tweets using a masked language modeling (MLM) objective from BERTweet for fact-classification task on COVID-19/vaccine.

  • Statements from Poynter and Snopes with Selenium 14,000 fact-checked statements from Jan 2020 to May 2021

  • Divide original labels within 3 categories

    • False: false, no evidence, manipulated, fake, not true, unproven, unverified
    • Misleading: misleading, exaggerated, out of context, needs context
    • True: true, correct

Eval results

Contributors

  • This page is a part of final team project from MLDL for DS class at SNU
    • Team BIBI - Vaccinating COVID-NineTweets
    • Team members: Ahn, Hyunju; An, Jiyong; An, Seungchan; Jeong, Seokho; Kim, Jungmin; Kim, Sangbeom
    • Advisor: Prof. Wen-Syan Li

GSDS