Update README.md
Browse files
README.md
CHANGED
@@ -10,14 +10,16 @@ widget:
|
|
10 |
# Disclaimer: This page is under maintenance. Please DO NOT refer to the information on this page to make any decision yet.
|
11 |
|
12 |
# Vaccinating COVID tweets
|
13 |
-
|
14 |
|
15 |
## Intended uses & limitations
|
16 |
|
17 |
#### How to use
|
18 |
|
19 |
```python
|
20 |
-
|
|
|
|
|
21 |
```
|
22 |
|
23 |
#### Limitations and bias
|
@@ -36,11 +38,15 @@ Provide examples of latent issues and potential remediations.
|
|
36 |
- Pre-training with recent COVID-19/vaccine tweets and fine-tuning for fact classification
|
37 |
|
38 |
#### 1) Pre-training language model
|
39 |
-
-
|
40 |
-
-
|
41 |
-
-
|
42 |
-
|
|
|
|
|
43 |
#### 2) Fine-tuning for fact classification
|
|
|
|
|
44 |
- Statements from Poynter and Snopes with Selenium 14,000 fact-checked statements from Jan 2020 to May 2021
|
45 |
- Divide original labels within 3 categories
|
46 |
- False: false, no evidence, manipulated, fake, not true, unproven, unverified
|
@@ -56,4 +62,4 @@ Provide examples of latent issues and potential remediations.
|
|
56 |
- Advisor: Prof. Wen-Syan Li
|
57 |
|
58 |
# ![GSDS](https://gsds.snu.ac.kr/sites/gsds.snu.ac.kr/files/GSDS_logo.png)
|
59 |
-
<img src="https://gsds.snu.ac.kr/sites/gsds.snu.ac.kr/files/GSDS_logo.png" width="
|
|
|
10 |
# Disclaimer: This page is under maintenance. Please DO NOT refer to the information on this page to make any decision yet.
|
11 |
|
12 |
# Vaccinating COVID tweets
|
13 |
+
A fine-tuned model for fact-classification task on English tweets about COVID-19/vaccine.
|
14 |
|
15 |
## Intended uses & limitations
|
16 |
|
17 |
#### How to use
|
18 |
|
19 |
```python
|
20 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
21 |
+
tokenizer = AutoTokenizer.from_pretrained("ans/vaccinating-covid-tweets")
|
22 |
+
model = AutoModelForSequenceClassification.from_pretrained("ans/vaccinating-covid-tweets")
|
23 |
```
|
24 |
|
25 |
#### Limitations and bias
|
|
|
38 |
- Pre-training with recent COVID-19/vaccine tweets and fine-tuning for fact classification
|
39 |
|
40 |
#### 1) Pre-training language model
|
41 |
+
- The model was pre-trained on COVID-19/vaccined related tweets using a masked language modeling (MLM) objective starting from BERTweet
|
42 |
+
- Following datasets on English tweets were used:
|
43 |
+
- Tweets with trending #CovidVaccine hashtag, 207,000 tweets uploaded across Aug 2020 to Apr 2021 ([kaggle](https://www.kaggle.com/kaushiksuresh147/covidvaccine-tweets))
|
44 |
+
- Tweets about all COVID-19 vaccines, 78,000 tweets uploaded across Dec 2020 to May 2021 ([kaggle](https://www.kaggle.com/gpreda/all-covid19-vaccines-tweets))
|
45 |
+
- COVID-19 Twitter chatter dataset, 590,000 tweets uploaded across Mar 2021 to May 2021 ([github](https://github.com/thepanacealab/covid19_twitter))
|
46 |
+
|
47 |
#### 2) Fine-tuning for fact classification
|
48 |
+
- A fine-tuned model on English tweets using a masked language modeling (MLM) objective from [BERTweet](https://github.com/VinAIResearch/BERTweet) for fact-classification task on COVID-19/vaccine.
|
49 |
+
|
50 |
- Statements from Poynter and Snopes with Selenium 14,000 fact-checked statements from Jan 2020 to May 2021
|
51 |
- Divide original labels within 3 categories
|
52 |
- False: false, no evidence, manipulated, fake, not true, unproven, unverified
|
|
|
62 |
- Advisor: Prof. Wen-Syan Li
|
63 |
|
64 |
# ![GSDS](https://gsds.snu.ac.kr/sites/gsds.snu.ac.kr/files/GSDS_logo.png)
|
65 |
+
<img src="https://gsds.snu.ac.kr/sites/gsds.snu.ac.kr/files/GSDS_logo.png" width="300" height="100">
|