kornosk commited on
Commit
2e9cb2d
1 Parent(s): 981250f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -0
README.md CHANGED
@@ -22,6 +22,42 @@ This model is pre-trained on over 83 million English tweets about the 2020 US Pr
22
 
23
  This model is initialized with BERTweet and trained with an MLM objective.
24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  # Reference
26
 
27
  - [PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter](XXX), LREC 2022.
 
22
 
23
  This model is initialized with BERTweet and trained with an MLM objective.
24
 
25
+ # Usage
26
+
27
+ This pre-trained language model **can be fine-tunned to any downstream task (e.g. classification)**.
28
+
29
+ Please see the [official repository](https://github.com/GU-DataLab/stance-detection-KE-MLM) for more detail.
30
+
31
+ ```python
32
+ from transformers import BertTokenizer, BertForMaskedLM, pipeline
33
+ import torch
34
+
35
+ # choose GPU if available
36
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
37
+
38
+ # select mode path here
39
+ pretrained_LM_path = "kornosk/polibertweet-mlm"
40
+
41
+ # load model
42
+ tokenizer = BertTokenizer.from_pretrained(pretrained_LM_path)
43
+ model = BertForMaskedLM.from_pretrained(pretrained_LM_path)
44
+
45
+ # fill mask
46
+ example = "Trump is the [MASK] of USA"
47
+ fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)
48
+
49
+ outputs = fill_mask(example)
50
+ print(outputs)
51
+
52
+ # see embeddings
53
+ inputs = tokenizer(example, return_tensors="pt")
54
+ outputs = model(**inputs)
55
+ print(outputs)
56
+
57
+ # OR you can use this model to train on your downstream task!
58
+ # please consider citing our paper if you feel this is useful :)
59
+ ```
60
+
61
  # Reference
62
 
63
  - [PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter](XXX), LREC 2022.