--- language: "en" tags: - twitter - masked-token-prediction - election2020 - politics license: "gpl-3.0" --- # Pre-trained BERT on Twitter US Political Election 2020 Pre-trained weights for [Knowledge Enhance Masked Language Model for Stance Detection](https://www.aclweb.org/anthology/2021.naacl-main.376), NAACL 2021. We use the initialized weights from BERT-base (uncased) or `bert-base-uncased`. # Training Data This model is pre-trained on over 5 million English tweets about the 2020 US Presidential Election. # Training Objective This model is initialized with BERT-base and trained with normal MLM objective. # Usage This pre-trained language model **can be fine-tunned to any downstream task (e.g. classification)**. Please see the [official repository](https://github.com/GU-DataLab/stance-detection-KE-MLM) for more detail. ```python from transformers import BertTokenizer, BertForMaskedLM, pipeline import torch # Choose GPU if available device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Select mode path here pretrained_LM_path = "kornosk/bert-political-election2020-twitter-mlm" # Load model tokenizer = BertTokenizer.from_pretrained(pretrained_LM_path) model = BertForMaskedLM.from_pretrained(pretrained_LM_path) # Fill mask example = "Trump is the [MASK] of USA" fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer) # Use following line instead of the above one does not work. # Huggingface have been updated, newer version accepts a string of model name instead. fill_mask = pipeline('fill-mask', model=pretrained_LM_path, tokenizer=tokenizer) outputs = fill_mask(example) print(outputs) # See embeddings inputs = tokenizer(example, return_tensors="pt") outputs = model(**inputs) print(outputs) # OR you can use this model to train on your downstream task! # Please consider citing our paper if you feel this is useful :) ``` # Reference - [Knowledge Enhance Masked Language Model for Stance Detection](https://www.aclweb.org/anthology/2021.naacl-main.376), NAACL 2021. # Citation ```bibtex @inproceedings{kawintiranon2021knowledge, title={Knowledge Enhanced Masked Language Model for Stance Detection}, author={Kawintiranon, Kornraphop and Singh, Lisa}, booktitle={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies}, year={2021}, publisher={Association for Computational Linguistics}, url={https://www.aclweb.org/anthology/2021.naacl-main.376} } ```