Update README.md
Browse files
README.md
CHANGED
@@ -72,20 +72,19 @@ To conservatively classify whether an input sequence is true or not, the model m
|
|
72 |
- Pre-training with recent COVID-19/vaccine tweets and fine-tuning for fact classification
|
73 |
|
74 |
#### 1) Pre-training language model
|
75 |
-
- The model was pre-trained on COVID-19/vaccined related tweets using a masked language modeling (MLM) objective starting from BERTweet
|
76 |
- Following datasets on English tweets were used:
|
77 |
- Tweets with trending #CovidVaccine hashtag, 207,000 tweets uploaded across Aug 2020 to Apr 2021 ([kaggle](https://www.kaggle.com/kaushiksuresh147/covidvaccine-tweets))
|
78 |
- Tweets about all COVID-19 vaccines, 78,000 tweets uploaded across Dec 2020 to May 2021 ([kaggle](https://www.kaggle.com/gpreda/all-covid19-vaccines-tweets))
|
79 |
- COVID-19 Twitter chatter dataset, 590,000 tweets uploaded across Mar 2021 to May 2021 ([github](https://github.com/thepanacealab/covid19_twitter))
|
80 |
|
81 |
#### 2) Fine-tuning for fact classification
|
82 |
-
- A fine-tuned model
|
83 |
-
|
84 |
-
-
|
85 |
-
-
|
86 |
-
-
|
87 |
-
-
|
88 |
-
- True: true, correct
|
89 |
|
90 |
## Evaluation results
|
91 |
| Training loss | Validation loss | Training accuracy | Validation accuracy |
|
@@ -93,7 +92,7 @@ To conservatively classify whether an input sequence is true or not, the model m
|
|
93 |
| 0.1062 | 0.1006 | 96.3% | 94.5% |
|
94 |
|
95 |
# Contributors
|
96 |
-
- This model is a part of final team project from MLDL for DS class at SNU
|
97 |
- Team BIBI - Vaccinating COVID-NineTweets
|
98 |
- Team members: Ahn, Hyunju; An, Jiyong; An, Seungchan; Jeong, Seokho; Kim, Jungmin; Kim, Sangbeom
|
99 |
- Advisor: Prof. Wen-Syan Li
|
|
|
72 |
- Pre-training with recent COVID-19/vaccine tweets and fine-tuning for fact classification
|
73 |
|
74 |
#### 1) Pre-training language model
|
75 |
+
- The model was pre-trained on COVID-19/vaccined related tweets using a masked language modeling (MLM) objective starting from BERTweet.
|
76 |
- Following datasets on English tweets were used:
|
77 |
- Tweets with trending #CovidVaccine hashtag, 207,000 tweets uploaded across Aug 2020 to Apr 2021 ([kaggle](https://www.kaggle.com/kaushiksuresh147/covidvaccine-tweets))
|
78 |
- Tweets about all COVID-19 vaccines, 78,000 tweets uploaded across Dec 2020 to May 2021 ([kaggle](https://www.kaggle.com/gpreda/all-covid19-vaccines-tweets))
|
79 |
- COVID-19 Twitter chatter dataset, 590,000 tweets uploaded across Mar 2021 to May 2021 ([github](https://github.com/thepanacealab/covid19_twitter))
|
80 |
|
81 |
#### 2) Fine-tuning for fact classification
|
82 |
+
- A fine-tuned model from pre-trained language model (1) for fact-classification task on COVID-19/vaccine.
|
83 |
+
- COVID-19/vaccine-related statements were collected from [Poynter](https://www.poynter.org/ifcn-covid-19-misinformation/) and [Snopes](https://www.snopes.com/) using Selenium resulting in over 14,000 fact-checked statements from Jan 2020 to May 2021.
|
84 |
+
- Original labels were divided within following three categories:
|
85 |
+
- `False`: includes false, no evidence, manipulated, fake, not true, unproven, unverified
|
86 |
+
- `Misleading`: includes misleading, exaggerated, out of context, needs context
|
87 |
+
- `True`: includes true, correct
|
|
|
88 |
|
89 |
## Evaluation results
|
90 |
| Training loss | Validation loss | Training accuracy | Validation accuracy |
|
|
|
92 |
| 0.1062 | 0.1006 | 96.3% | 94.5% |
|
93 |
|
94 |
# Contributors
|
95 |
+
- This model is a part of final team project from MLDL for DS class at SNU.
|
96 |
- Team BIBI - Vaccinating COVID-NineTweets
|
97 |
- Team members: Ahn, Hyunju; An, Jiyong; An, Seungchan; Jeong, Seokho; Kim, Jungmin; Kim, Sangbeom
|
98 |
- Advisor: Prof. Wen-Syan Li
|