kornosk
/

polibertweet-political-twitter-roberta-mlm

masked-token-prediction

Inference Endpoints

Model card Files Files and versions Community

kornosk commited on May 2, 2022

Commit

2e9cb2d

•

1 Parent(s): 981250f

Update README.md

Files changed (1) hide show

README.md +36 -0

README.md CHANGED Viewed

@@ -22,6 +22,42 @@ This model is pre-trained on over 83 million English tweets about the 2020 US Pr
 This model is initialized with BERTweet and trained with an MLM objective.
 # Reference
 - [PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter](XXX), LREC 2022.

 This model is initialized with BERTweet and trained with an MLM objective.
+# Usage
+This pre-trained language model **can be fine-tunned to any downstream task (e.g. classification)**.
+Please see the [official repository](https://github.com/GU-DataLab/stance-detection-KE-MLM) for more detail.
+```python
+from transformers import BertTokenizer, BertForMaskedLM, pipeline
+import torch
+# choose GPU if available
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+# select mode path here
+pretrained_LM_path = "kornosk/polibertweet-mlm"
+# load model
+tokenizer = BertTokenizer.from_pretrained(pretrained_LM_path)
+model = BertForMaskedLM.from_pretrained(pretrained_LM_path)
+# fill mask
+example = "Trump is the [MASK] of USA"
+fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)
+outputs = fill_mask(example)
+print(outputs)
+# see embeddings
+inputs = tokenizer(example, return_tensors="pt")
+outputs = model(**inputs)
+print(outputs)
+# OR you can use this model to train on your downstream task!
+# please consider citing our paper if you feel this is useful :)
+```
 # Reference
 - [PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter](XXX), LREC 2022.