---

language: en
license: apache-2.0
datasets:
- tweets
widget:
- text: "COVID-19 vaccines are safe and effective."
---

# Disclaimer: This page is under maintenance. Please DO NOT refer to the information on this page to make any decision yet.

# Vaccinating COVID tweets
- A part of MDLD for DS class at SNU

Fine-tuned model on English language using a masked language modeling (MLM) objective from BERTweet in [this repository](https://github.com/VinAIResearch/BERTweet) for the classification task for false/misleading information about COVID-19 vaccines.

## Model description

- Baseline model: BERTweet14,15
  - trained based on the RoBERTa pre-training procedure
  - 850M General English Tweets (Jan 2012 ~ Aug 2019)
  - 23M COVID-19 English Tweets
  - Size of the model: >134M parameters
- Further training
  - Training with recent COVID-19 and vaccine tweets

You can embed local or remote images using `![](...)`

## Intended uses & limitations

#### How to use

```python
# You can include sample code which will be formatted
```

#### Limitations and bias

Provide examples of latent issues and potential remediations.

## Training data

#### 1) Pre-training language model
- Tweets with trending #CovidVaccine hashtag 207,000 tweets uploaded across 2020-08-18 ~ 2021-04-20 [3]
- Tweets about all COVID-19 vaccines 78,000 tweets uploaded across 2020-12-20 ~ 2021-05-13 [4]
- Covid-19 Twitter chatter dataset 590,000 tweets uploaded across 2021-03-01 ~ 2021-05-20 [5]

#### 2) Fine-tuning for fact classification
- Statements from Poynter and Snopes with Selenium 14,000 fact-checked statements from 2020-01-14 to 2021-05-13 
- Divide original labels within 3 categories
False: 		False, no evidence, manipulated, fake, not true, unproven, unverified
Misleading: 	Misleading, exaggerated, out of context, needs context
True:		True, correct

Describe the data you used to train the model.
If you initialized it with pre-trained weights, add a link to the pre-trained model card or repository with description of the pre-training data.

## Training procedure

Preprocessing, hardware used, hyperparameters...

## Eval results

### BibTeX entry and citation info

```bibtex
@inproceedings{...,
  year={2020}
}
```
# Contributors
- Ahn, Hyunju
- An, Jiyong
- An, Seungchan
- Jeong, Seokho
- Kim, Jungmin
- Kim, Sangbeom