--- language: en license: apache-2.0 datasets: - tweets widget: - text: "COVID-19 vaccines are safe and effective." --- # Disclaimer: This page is under maintenance. Please DO NOT refer to the information on this page to make any decision yet. # Vaccinating COVID tweets - A part of MDLD for DS class at SNU Fine-tuned model on English language using a masked language modeling (MLM) objective from BERTweet in [this repository](https://github.com/VinAIResearch/BERTweet) for the classification task for false/misleading information about COVID-19 vaccines. ## Model description - Baseline model: BERTweet14,15 - trained based on the RoBERTa pre-training procedure - 850M General English Tweets (Jan 2012 ~ Aug 2019) - 23M COVID-19 English Tweets - Size of the model: >134M parameters - Further training - Training with recent COVID-19 and vaccine tweets You can embed local or remote images using `![](...)` ## Intended uses & limitations #### How to use ```python # You can include sample code which will be formatted ``` #### Limitations and bias Provide examples of latent issues and potential remediations. ## Training data #### 1) Pre-training language model - Tweets with trending #CovidVaccine hashtag 207,000 tweets uploaded across 2020-08-18 ~ 2021-04-20 [3] - Tweets about all COVID-19 vaccines 78,000 tweets uploaded across 2020-12-20 ~ 2021-05-13 [4] - Covid-19 Twitter chatter dataset 590,000 tweets uploaded across 2021-03-01 ~ 2021-05-20 [5] #### 2) Fine-tuning for fact classification - Statements from Poynter and Snopes with Selenium 14,000 fact-checked statements from 2020-01-14 to 2021-05-13 - Divide original labels within 3 categories False: False, no evidence, manipulated, fake, not true, unproven, unverified Misleading: Misleading, exaggerated, out of context, needs context True: True, correct Describe the data you used to train the model. If you initialized it with pre-trained weights, add a link to the pre-trained model card or repository with description of the pre-training data. ## Training procedure Preprocessing, hardware used, hyperparameters... ## Eval results ### BibTeX entry and citation info ```bibtex @inproceedings{..., year={2020} } ``` # Contributors - Ahn, Hyunju - An, Jiyong - An, Seungchan - Jeong, Seokho - Kim, Jungmin - Kim, Sangbeom