system HF staff commited on
Commit
23afb72
1 Parent(s): 42a8842

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -0
README.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "en"
3
+ thumbnail: "https://raw.githubusercontent.com/digitalepidemiologylab/covid-twitter-bert/master/images/COVID-Twitter-BERT_small.png"
4
+ tags:
5
+ - Twitter
6
+ - COVID-19
7
+ license: "MIT"
8
+ ---
9
+
10
+ # COVID-Twitter-BERT v2
11
+
12
+ ## Model description
13
+
14
+ BERT-large-uncased model, pretrained on a corpus of messages from Twitter about COVID-19
15
+
16
+ ## Intended uses & limitations
17
+
18
+ #### How to use
19
+
20
+ ```python
21
+ # You can include sample code which will be formatted
22
+ from transformers import pipeline
23
+ import json
24
+
25
+ pipe = pipeline(task='fill-mask', model='digitalepidemiologylab/covid-twitter-bert-v2')
26
+ out = pipe(f"In places with a lot of people, it's a good idea to wear a {pipe.tokenizer.mask_token}")
27
+ print(json.dumps(out, indent=4))
28
+ [
29
+ {
30
+ "sequence": "[CLS] in places with a lot of people, it's a good idea to wear a mask [SEP]",
31
+ "score": 0.9998226761817932,
32
+ "token": 7308,
33
+ "token_str": "mask"
34
+ },
35
+ ...
36
+ ]
37
+ ```
38
+
39
+ ## Training data
40
+ Describe the data you used to train the model.
41
+ If you initialized it with pre-trained weights, add a link to the pre-trained model card or repository with description of the pre-training data.
42
+
43
+ ## Training procedure
44
+ This model was trained on 97M unique tweets (1.2B training examples) collected between January 12 and July 5, 2020 containing at least one of the keywords "wuhan", "ncov", "coronavirus", "covid", or "sars-cov-2". These tweets were filtered and preprocessed to reach a final sample of 22.5M tweets (containing 40.7M sentences and 633M tokens) which were used for training.
45
+
46
+ ## Eval results
47
+ The model was evaluated based on downstream Twitter text classification tasks from previous SemEval challenges.
48
+
49
+ ### BibTeX entry and citation info
50
+
51
+ ```bibtex
52
+ @article{muller2020covid,
53
+ title={COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter},
54
+ author={M{\"u}ller, Martin and Salath{\'e}, Marcel and Kummervold, Per E},
55
+ journal={arXiv preprint arXiv:2005.07503},
56
+ year={2020}
57
+ }
58
+ ```
59
+
60
+ or
61
+
62
+ ```Martin Müller, Marcel Salathé, and Per E. Kummervold.
63
+ COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter.
64
+ arXiv preprint arXiv:2005.07503 (2020).
65
+ ```