--- language: "en" tags: - twitter - masked-token-prediction - bertweet - election2020 - politics license: "gpl-3.0" --- # Pre-trained BERT on Twitter US Political Election 2020 Pre-trained weights for PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter, LREC 2022. Please see the [official repository](https://github.com/GU-DataLab/PoliBERTweet) for more detail. We use the initialized weights from [BERTweet](https://huggingface.co/vinai/bertweet-base) or `vinai/bertweet-base`. # Training Data This model is pre-trained on over 83 million English tweets about the 2020 US Presidential Election. # Training Objective This model is initialized with BERTweet and trained with an MLM objective. # Usage This pre-trained language model **can be fine-tunned to any downstream task (e.g. classification)**. ```python from transformers import AutoModel, AutoTokenizer, pipeline import torch # choose GPU if available device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # select mode path here pretrained_LM_path = "kornosk/polibertweet-mlm" # load model tokenizer = AutoTokenizer.from_pretrained(pretrained_LM_path) model = AutoModel.from_pretrained(pretrained_LM_path) # fill mask example = "Trump is the of USA" fill_mask = pipeline('fill-mask', model=pretrained_LM_path, tokenizer=tokenizer) outputs = fill_mask(example) print(outputs) # see embeddings inputs = tokenizer(example, return_tensors="pt") outputs = model(**inputs) print(outputs) # OR you can use this model to train on your downstream task! # please consider citing our paper if you feel this is useful :) ``` # Reference - [PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter](XXX), LREC 2022. # Citation ```bibtex @inproceedings{kawintiranon2022polibertweet, title = {PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter}, author = {Kawintiranon, Kornraphop and Singh, Lisa}, booktitle = {Proceedings of the Language Resources and Evaluation Conference}, year = {2022}, publisher = {European Language Resources Association} } ```