ans commited on
Commit
86dac4a
1 Parent(s): 3661728

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -9
README.md CHANGED
@@ -72,20 +72,19 @@ To conservatively classify whether an input sequence is true or not, the model m
72
  - Pre-training with recent COVID-19/vaccine tweets and fine-tuning for fact classification
73
 
74
  #### 1) Pre-training language model
75
- - The model was pre-trained on COVID-19/vaccined related tweets using a masked language modeling (MLM) objective starting from BERTweet
76
  - Following datasets on English tweets were used:
77
  - Tweets with trending #CovidVaccine hashtag, 207,000 tweets uploaded across Aug 2020 to Apr 2021 ([kaggle](https://www.kaggle.com/kaushiksuresh147/covidvaccine-tweets))
78
  - Tweets about all COVID-19 vaccines, 78,000 tweets uploaded across Dec 2020 to May 2021 ([kaggle](https://www.kaggle.com/gpreda/all-covid19-vaccines-tweets))
79
  - COVID-19 Twitter chatter dataset, 590,000 tweets uploaded across Mar 2021 to May 2021 ([github](https://github.com/thepanacealab/covid19_twitter))
80
 
81
  #### 2) Fine-tuning for fact classification
82
- - A fine-tuned model on English tweets using a masked language modeling (MLM) objective from [BERTweet](https://github.com/VinAIResearch/BERTweet) for fact-classification task on COVID-19/vaccine.
83
-
84
- - Statements from Poynter and Snopes with Selenium 14,000 fact-checked statements from Jan 2020 to May 2021
85
- - Divide original labels within 3 categories
86
- - False: false, no evidence, manipulated, fake, not true, unproven, unverified
87
- - Misleading: misleading, exaggerated, out of context, needs context
88
- - True: true, correct
89
 
90
  ## Evaluation results
91
  | Training loss | Validation loss | Training accuracy | Validation accuracy |
@@ -93,7 +92,7 @@ To conservatively classify whether an input sequence is true or not, the model m
93
  | 0.1062 | 0.1006 | 96.3% | 94.5% |
94
 
95
  # Contributors
96
- - This model is a part of final team project from MLDL for DS class at SNU
97
  - Team BIBI - Vaccinating COVID-NineTweets
98
  - Team members: Ahn, Hyunju; An, Jiyong; An, Seungchan; Jeong, Seokho; Kim, Jungmin; Kim, Sangbeom
99
  - Advisor: Prof. Wen-Syan Li
 
72
  - Pre-training with recent COVID-19/vaccine tweets and fine-tuning for fact classification
73
 
74
  #### 1) Pre-training language model
75
+ - The model was pre-trained on COVID-19/vaccined related tweets using a masked language modeling (MLM) objective starting from BERTweet.
76
  - Following datasets on English tweets were used:
77
  - Tweets with trending #CovidVaccine hashtag, 207,000 tweets uploaded across Aug 2020 to Apr 2021 ([kaggle](https://www.kaggle.com/kaushiksuresh147/covidvaccine-tweets))
78
  - Tweets about all COVID-19 vaccines, 78,000 tweets uploaded across Dec 2020 to May 2021 ([kaggle](https://www.kaggle.com/gpreda/all-covid19-vaccines-tweets))
79
  - COVID-19 Twitter chatter dataset, 590,000 tweets uploaded across Mar 2021 to May 2021 ([github](https://github.com/thepanacealab/covid19_twitter))
80
 
81
  #### 2) Fine-tuning for fact classification
82
+ - A fine-tuned model from pre-trained language model (1) for fact-classification task on COVID-19/vaccine.
83
+ - COVID-19/vaccine-related statements were collected from [Poynter](https://www.poynter.org/ifcn-covid-19-misinformation/) and [Snopes](https://www.snopes.com/) using Selenium resulting in over 14,000 fact-checked statements from Jan 2020 to May 2021.
84
+ - Original labels were divided within following three categories:
85
+ - `False`: includes false, no evidence, manipulated, fake, not true, unproven, unverified
86
+ - `Misleading`: includes misleading, exaggerated, out of context, needs context
87
+ - `True`: includes true, correct
 
88
 
89
  ## Evaluation results
90
  | Training loss | Validation loss | Training accuracy | Validation accuracy |
 
92
  | 0.1062 | 0.1006 | 96.3% | 94.5% |
93
 
94
  # Contributors
95
+ - This model is a part of final team project from MLDL for DS class at SNU.
96
  - Team BIBI - Vaccinating COVID-NineTweets
97
  - Team members: Ahn, Hyunju; An, Jiyong; An, Seungchan; Jeong, Seokho; Kim, Jungmin; Kim, Sangbeom
98
  - Advisor: Prof. Wen-Syan Li