ans
/

vaccinating-covid-tweets

@@ -1,5 +1,4 @@
 ---
 language: en
 license: apache-2.0
 datasets:
@@ -11,8 +10,7 @@ widget:
 # Disclaimer: This page is under maintenance. Please DO NOT refer to the information on this page to make any decision yet.
 # Vaccinating COVID tweets
-Fine-tuned model on English language using a masked language modeling (MLM) objective from BERTweet in [this repository](https://github.com/VinAIResearch/BERTweet) for the classification task for false/misleading information about COVID-19 vaccines.
 ## Intended uses & limitations
@@ -26,32 +24,28 @@ Fine-tuned model on English language using a masked language modeling (MLM) obje
 Provide examples of latent issues and potential remediations.
-## Training data
-#### 1) Pre-training language model
-- Tweets with trending #CovidVaccine hashtag 207,000 tweets uploaded across 2020-08-18 ~ 2021-04-20 [3]
-- Tweets about all COVID-19 vaccines 78,000 tweets uploaded across 2020-12-20 ~ 2021-05-13 [4]
-- Covid-19 Twitter chatter dataset 590,000 tweets uploaded across 2021-03-01 ~ 2021-05-20 [5]
-#### 2) Fine-tuning for fact classification
-- Statements from Poynter and Snopes with Selenium 14,000 fact-checked statements from 2020-01-14 to 2021-05-13
-- Divide original labels within 3 categories
-False: \\\\\\\\t\\\\\\\\tFalse, no evidence, manipulated, fake, not true, unproven, unverified
-Misleading: \\\\\\\\tMisleading, exaggerated, out of context, needs context
-True:\\\\\\\\t\\\\\\\\tTrue, correct
-Describe the data you used to train the model.
-If you initialized it with pre-trained weights, add a link to the pre-trained model card or repository with description of the pre-training data.
-## Training procedure
-- Baseline model: [BERTweet](https://github.com/VinAIResearch/BERTweet)
   - trained based on the RoBERTa pre-training procedure
-  - 850M General English Tweets (Jan 2012 ~ Aug 2019)
   - 23M COVID-19 English Tweets
   - Size of the model: >134M parameters
 - Further training
-  - Training with recent COVID-19 and vaccine tweets
 ## Eval results

 ---
 language: en
 license: apache-2.0
 datasets:
 # Disclaimer: This page is under maintenance. Please DO NOT refer to the information on this page to make any decision yet.
 # Vaccinating COVID tweets
+Fine-tuned model on English language using a masked language modeling (MLM) objective from BERTweet in [this repository](https://github.com/VinAIResearch/BERTweet) for the classification task for factual information about COVID-19/vaccine.
 ## Intended uses & limitations
 Provide examples of latent issues and potential remediations.
+## Training data & Procedure
+#### Pre-trained baseline model
+- Pre-trained model: [BERTweet](https://github.com/VinAIResearch/BERTweet)
   - trained based on the RoBERTa pre-training procedure
+  - 850M General English Tweets (Jan 2012 to Aug 2019)
   - 23M COVID-19 English Tweets
   - Size of the model: >134M parameters
 - Further training
+  - Pre-training with recent COVID-19/vaccine tweets and fine-tuning for fact classification
+#### 1) Pre-training language model
+- Tweets with trending #CovidVaccine hashtag, 207,000 tweets uploaded across Aug 2020 to Apr 2021 [kaggle](https://www.kaggle.com/kaushiksuresh147/covidvaccine-tweets)
+- Tweets about all COVID-19 vaccines, 78,000 tweets uploaded across Dec 2020 to May 2021 [kaggle](https://www.kaggle.com/gpreda/all-covid19-vaccines-tweets)
+- COVID-19 Twitter chatter dataset, 590,000 tweets uploaded across Mar 2021 to May 2021 [github](https://github.com/thepanacealab/covid19_twitter)
+#### 2) Fine-tuning for fact classification
+- Statements from Poynter and Snopes with Selenium 14,000 fact-checked statements from Jan 2020 to May 2021
+- Divide original labels within 3 categories
+  - False: false, no evidence, manipulated, fake, not true, unproven, unverified
+  - Misleading: misleading, exaggerated, out of context, needs context
+  - True: true, correct
 ## Eval results