File size: 3,350 Bytes
30782ff |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
# Pre-trained BERT on Twitter US Election 2020 for Stance Detection towards Joe Biden (KE-MLM)
Pre-trained weights for **KE-MLM model** in [Knowledge Enhance Masked Language Model for Stance Detection](https://2021.naacl.org/program/accepted/), NAACL 2021.
# Training Data
This model is pre-trained on over 5 million English tweets about the 2020 US Presidential Election. Then fine-tuned using our [stance-labeled data](https://github.com/GU-DataLab/stance-detection-KE-MLM) for stance detection towards Joe Biden.
# Training Objective
This model is initialized with BERT-base and trained with normal MLM objective with classification layer fine-tuned for stance detection towards Joe Biden.
# Usage
This pre-trained language model is fine-tuned to the stance detection task specifically for Joe Biden.
Please see the [official repository](https://github.com/GU-DataLab/stance-detection-KE-MLM) for more detail.
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
import numpy as np
# choose GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# select mode path here
pretrained_LM_path = "kornosk/bert-election2020-twitter-stance-biden-KE-MLM"
# load model
tokenizer = AutoTokenizer.from_pretrained(pretrained_LM_path)
model = AutoModelForSequenceClassification.from_pretrained(pretrained_LM_path)
id2label = {
0: "AGAINST",
1: "FAVOR",
2: "NONE"
}
##### Prediction Neutral #####
sentence = "Hello World."
inputs = tokenizer(sentence.lower(), return_tensors="pt")
outputs = model(**inputs)
predicted_probability = torch.softmax(outputs[0], dim=1)[0].tolist()
print("Sentence:", sentence)
print("Prediction:", id2label[np.argmax(predicted_probability)])
print("Against:", predicted_probability[0])
print("Favor:", predicted_probability[1])
print("Neutral:", predicted_probability[2])
##### Prediction Favor #####
sentence = "Go Go Biden!!!"
inputs = tokenizer(sentence.lower(), return_tensors="pt")
outputs = model(**inputs)
predicted_probability = torch.softmax(outputs[0], dim=1)[0].tolist()
print("Sentence:", sentence)
print("Prediction:", id2label[np.argmax(predicted_probability)])
print("Against:", predicted_probability[0])
print("Favor:", predicted_probability[1])
print("Neutral:", predicted_probability[2])
##### Prediction Against #####
sentence = "Biden is the worst."
inputs = tokenizer(sentence.lower(), return_tensors="pt")
outputs = model(**inputs)
predicted_probability = torch.softmax(outputs[0], dim=1)[0].tolist()
print("Sentence:", sentence)
print("Prediction:", id2label[np.argmax(predicted_probability)])
print("Against:", predicted_probability[0])
print("Favor:", predicted_probability[1])
print("Neutral:", predicted_probability[2])
# please consider citing our paper if you feel this is useful :)
```
# Reference
- [Knowledge Enhance Masked Language Model for Stance Detection](https://2021.naacl.org/program/accepted/), NAACL 2021.
# Citation
```bibtex
@inproceedings{kawintiranon2021knowledge,
title={Knowledge Enhanced Masked Language Model for Stance Detection},
author={Kawintiranon, Kornraphop and Singh, Lisa},
booktitle={Proceedings of the 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)},
year={2021},
url={#}
}
``` |