Update README.md
Browse files
README.md
CHANGED
@@ -14,18 +14,6 @@ widget:
|
|
14 |
|
15 |
Fine-tuned model on English language using a masked language modeling (MLM) objective from BERTweet in [this repository](https://github.com/VinAIResearch/BERTweet) for the classification task for false/misleading information about COVID-19 vaccines.
|
16 |
|
17 |
-
## Model description
|
18 |
-
|
19 |
-
- Baseline model: BERTweet14,15
|
20 |
-
- trained based on the RoBERTa pre-training procedure
|
21 |
-
- 850M General English Tweets (Jan 2012 ~ Aug 2019)
|
22 |
-
- 23M COVID-19 English Tweets
|
23 |
-
- Size of the model: >134M parameters
|
24 |
-
- Further training
|
25 |
-
- Training with recent COVID-19 and vaccine tweets
|
26 |
-
|
27 |
-
You can embed local or remote images using `![](https://gsds.snu.ac.kr/sites/gsds.snu.ac.kr/files/GSDS_logo.png)`
|
28 |
-
|
29 |
## Intended uses & limitations
|
30 |
|
31 |
#### How to use
|
@@ -48,34 +36,30 @@ Provide examples of latent issues and potential remediations.
|
|
48 |
#### 2) Fine-tuning for fact classification
|
49 |
- Statements from Poynter and Snopes with Selenium 14,000 fact-checked statements from 2020-01-14 to 2021-05-13
|
50 |
- Divide original labels within 3 categories
|
51 |
-
False:
|
52 |
-
Misleading:
|
53 |
-
True
|
54 |
|
55 |
Describe the data you used to train the model.
|
56 |
If you initialized it with pre-trained weights, add a link to the pre-trained model card or repository with description of the pre-training data.
|
57 |
|
58 |
## Training procedure
|
59 |
|
60 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
61 |
|
62 |
## Eval results
|
63 |
|
64 |
-
### BibTeX entry and citation info
|
65 |
-
|
66 |
-
```bibtex
|
67 |
-
@inproceedings{...,
|
68 |
-
year={2020}
|
69 |
-
}
|
70 |
-
```
|
71 |
# Contributors
|
72 |
-
-
|
73 |
-
-
|
74 |
-
- An, Jiyong
|
75 |
-
-
|
76 |
-
|
77 |
-
|
78 |
-
|
79 |
-
- Advisor: Wen-Syan Li
|
80 |
-
|
81 |
-
![GSDS](https://gsds.snu.ac.kr/sites/gsds.snu.ac.kr/files/GSDS_logo.png)
|
|
|
14 |
|
15 |
Fine-tuned model on English language using a masked language modeling (MLM) objective from BERTweet in [this repository](https://github.com/VinAIResearch/BERTweet) for the classification task for false/misleading information about COVID-19 vaccines.
|
16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
## Intended uses & limitations
|
18 |
|
19 |
#### How to use
|
|
|
36 |
#### 2) Fine-tuning for fact classification
|
37 |
- Statements from Poynter and Snopes with Selenium 14,000 fact-checked statements from 2020-01-14 to 2021-05-13
|
38 |
- Divide original labels within 3 categories
|
39 |
+
False: \\\\\\\\t\\\\\\\\tFalse, no evidence, manipulated, fake, not true, unproven, unverified
|
40 |
+
Misleading: \\\\\\\\tMisleading, exaggerated, out of context, needs context
|
41 |
+
True:\\\\\\\\t\\\\\\\\tTrue, correct
|
42 |
|
43 |
Describe the data you used to train the model.
|
44 |
If you initialized it with pre-trained weights, add a link to the pre-trained model card or repository with description of the pre-training data.
|
45 |
|
46 |
## Training procedure
|
47 |
|
48 |
+
- Baseline model: [BERTweet](https://github.com/VinAIResearch/BERTweet)
|
49 |
+
- trained based on the RoBERTa pre-training procedure
|
50 |
+
- 850M General English Tweets (Jan 2012 ~ Aug 2019)
|
51 |
+
- 23M COVID-19 English Tweets
|
52 |
+
- Size of the model: >134M parameters
|
53 |
+
- Further training
|
54 |
+
- Training with recent COVID-19 and vaccine tweets
|
55 |
|
56 |
## Eval results
|
57 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
58 |
# Contributors
|
59 |
+
- This page is a part of final team project from MLDL for DS class at SNU
|
60 |
+
- Team BIBI - Vaccinating COVID-NineTweets
|
61 |
+
- Team members: Ahn, Hyunju; An, Jiyong; An, Seungchan; Jeong, Seokho; Kim, Jungmin; Kim, Sangbeom
|
62 |
+
- Advisor: Prof. Wen-Syan Li
|
63 |
+
|
64 |
+
# ![GSDS](https://gsds.snu.ac.kr/sites/gsds.snu.ac.kr/files/GSDS_logo.png)
|
65 |
+
<img src="https://gsds.snu.ac.kr/sites/gsds.snu.ac.kr/files/GSDS_logo.png" width="100" height="100">
|
|
|
|
|
|