ans
/

vaccinating-covid-tweets

Text Classification

Inference Endpoints

Model card Files Files and versions Community

ans commited on Jun 10, 2021

Commit

d0fd8b9

•

1 Parent(s): 91d9f45

Update README.md

Files changed (1) hide show

README.md +21 -1

README.md CHANGED Viewed

@@ -5,7 +5,7 @@ license: apache-2.0
 datasets:
 - tweets
 widget:
-- text: "COVID-19 vaccine is ineffective to prevent from infection."
 ---
 # Disclaimer: This page is under maintenance. Please DO NOT refer to the information on this page to make any decision yet.
@@ -17,6 +17,14 @@ Fine-tuned model on English language using a masked language modeling (MLM) obje
 ## Model description
 You can embed local or remote images using `![](...)`
 ## Intended uses & limitations
@@ -33,6 +41,18 @@ Provide examples of latent issues and potential remediations.
 ## Training data
 Describe the data you used to train the model.
 If you initialized it with pre-trained weights, add a link to the pre-trained model card or repository with description of the pre-training data.

 datasets:
 - tweets
 widget:
+- text: "COVID-19 vaccines are safe and effective."
 ---
 # Disclaimer: This page is under maintenance. Please DO NOT refer to the information on this page to make any decision yet.
 ## Model description
+- Baseline model: BERTweet14,15
+  - trained based on the RoBERTa pre-training procedure
+  - 850M General English Tweets (Jan 2012 ~ Aug 2019)
+  - 23M COVID-19 English Tweets
+  - Size of the model: >134M parameters
+- Further training
+  - Training with recent COVID-19 and vaccine tweets
 You can embed local or remote images using `![](...)`
 ## Intended uses & limitations
 ## Training data
+#### 1) Pre-training language model
+- Tweets with trending #CovidVaccine hashtag 207,000 tweets uploaded across 2020-08-18 ~ 2021-04-20 [3]
+- Tweets about all COVID-19 vaccines 78,000 tweets uploaded across 2020-12-20 ~ 2021-05-13 [4]
+- Covid-19 Twitter chatter dataset 590,000 tweets uploaded across 2021-03-01 ~ 2021-05-20 [5]
+#### 2) Fine-tuning for fact classification
+- Statements from Poynter and Snopes with Selenium 14,000 fact-checked statements from 2020-01-14 to 2021-05-13
+- Divide original labels within 3 categories
+False: 		False, no evidence, manipulated, fake, not true, unproven, unverified
+Misleading: 	Misleading, exaggerated, out of context, needs context
+True:		True, correct
 Describe the data you used to train the model.
 If you initialized it with pre-trained weights, add a link to the pre-trained model card or repository with description of the pre-training data.