ans commited on
Commit
d0fd8b9
1 Parent(s): 91d9f45

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -1
README.md CHANGED
@@ -5,7 +5,7 @@ license: apache-2.0
5
  datasets:
6
  - tweets
7
  widget:
8
- - text: "COVID-19 vaccine is ineffective to prevent from infection."
9
  ---
10
 
11
  # Disclaimer: This page is under maintenance. Please DO NOT refer to the information on this page to make any decision yet.
@@ -17,6 +17,14 @@ Fine-tuned model on English language using a masked language modeling (MLM) obje
17
 
18
  ## Model description
19
 
 
 
 
 
 
 
 
 
20
  You can embed local or remote images using `![](...)`
21
 
22
  ## Intended uses & limitations
@@ -33,6 +41,18 @@ Provide examples of latent issues and potential remediations.
33
 
34
  ## Training data
35
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  Describe the data you used to train the model.
37
  If you initialized it with pre-trained weights, add a link to the pre-trained model card or repository with description of the pre-training data.
38
 
 
5
  datasets:
6
  - tweets
7
  widget:
8
+ - text: "COVID-19 vaccines are safe and effective."
9
  ---
10
 
11
  # Disclaimer: This page is under maintenance. Please DO NOT refer to the information on this page to make any decision yet.
 
17
 
18
  ## Model description
19
 
20
+ - Baseline model: BERTweet14,15
21
+ - trained based on the RoBERTa pre-training procedure
22
+ - 850M General English Tweets (Jan 2012 ~ Aug 2019)
23
+ - 23M COVID-19 English Tweets
24
+ - Size of the model: >134M parameters
25
+ - Further training
26
+ - Training with recent COVID-19 and vaccine tweets
27
+
28
  You can embed local or remote images using `![](...)`
29
 
30
  ## Intended uses & limitations
 
41
 
42
  ## Training data
43
 
44
+ #### 1) Pre-training language model
45
+ - Tweets with trending #CovidVaccine hashtag 207,000 tweets uploaded across 2020-08-18 ~ 2021-04-20 [3]
46
+ - Tweets about all COVID-19 vaccines 78,000 tweets uploaded across 2020-12-20 ~ 2021-05-13 [4]
47
+ - Covid-19 Twitter chatter dataset 590,000 tweets uploaded across 2021-03-01 ~ 2021-05-20 [5]
48
+
49
+ #### 2) Fine-tuning for fact classification
50
+ - Statements from Poynter and Snopes with Selenium 14,000 fact-checked statements from 2020-01-14 to 2021-05-13
51
+ - Divide original labels within 3 categories
52
+ False: False, no evidence, manipulated, fake, not true, unproven, unverified
53
+ Misleading: Misleading, exaggerated, out of context, needs context
54
+ True: True, correct
55
+
56
  Describe the data you used to train the model.
57
  If you initialized it with pre-trained weights, add a link to the pre-trained model card or repository with description of the pre-training data.
58