Update README.md
Browse files
README.md
CHANGED
@@ -22,6 +22,42 @@ This model is pre-trained on over 83 million English tweets about the 2020 US Pr
|
|
22 |
|
23 |
This model is initialized with BERTweet and trained with an MLM objective.
|
24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
# Reference
|
26 |
|
27 |
- [PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter](XXX), LREC 2022.
|
|
|
22 |
|
23 |
This model is initialized with BERTweet and trained with an MLM objective.
|
24 |
|
25 |
+
# Usage
|
26 |
+
|
27 |
+
This pre-trained language model **can be fine-tunned to any downstream task (e.g. classification)**.
|
28 |
+
|
29 |
+
Please see the [official repository](https://github.com/GU-DataLab/stance-detection-KE-MLM) for more detail.
|
30 |
+
|
31 |
+
```python
|
32 |
+
from transformers import BertTokenizer, BertForMaskedLM, pipeline
|
33 |
+
import torch
|
34 |
+
|
35 |
+
# choose GPU if available
|
36 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
37 |
+
|
38 |
+
# select mode path here
|
39 |
+
pretrained_LM_path = "kornosk/polibertweet-mlm"
|
40 |
+
|
41 |
+
# load model
|
42 |
+
tokenizer = BertTokenizer.from_pretrained(pretrained_LM_path)
|
43 |
+
model = BertForMaskedLM.from_pretrained(pretrained_LM_path)
|
44 |
+
|
45 |
+
# fill mask
|
46 |
+
example = "Trump is the [MASK] of USA"
|
47 |
+
fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)
|
48 |
+
|
49 |
+
outputs = fill_mask(example)
|
50 |
+
print(outputs)
|
51 |
+
|
52 |
+
# see embeddings
|
53 |
+
inputs = tokenizer(example, return_tensors="pt")
|
54 |
+
outputs = model(**inputs)
|
55 |
+
print(outputs)
|
56 |
+
|
57 |
+
# OR you can use this model to train on your downstream task!
|
58 |
+
# please consider citing our paper if you feel this is useful :)
|
59 |
+
```
|
60 |
+
|
61 |
# Reference
|
62 |
|
63 |
- [PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter](XXX), LREC 2022.
|