kornosk's picture
Update README.md
7ebea6e
---
language: "en"
tags:
- twitter
- masked-token-prediction
- bertweet
- election2020
- politics
license: "gpl-3.0"
---
# Pre-trained BERT on Twitter US Political Election 2020
Pre-trained weights for PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter, LREC 2022.
Please see the [official repository](https://github.com/GU-DataLab/PoliBERTweet) for more detail.
We use the initialized weights from [BERTweet](https://huggingface.co/vinai/bertweet-base) or `vinai/bertweet-base`.
# Training Data
This model is pre-trained on over 83 million English tweets about the 2020 US Presidential Election.
# Training Objective
This model is initialized with BERTweet and trained with an MLM objective.
# Usage
This pre-trained language model **can be fine-tunned to any downstream task (e.g. classification)**.
```python
from transformers import AutoModel, AutoTokenizer, pipeline
import torch
# choose GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# select mode path here
pretrained_LM_path = "kornosk/polibertweet-mlm"
# load model
tokenizer = AutoTokenizer.from_pretrained(pretrained_LM_path)
model = AutoModel.from_pretrained(pretrained_LM_path)
# fill mask
example = "Trump is the <mask> of USA"
fill_mask = pipeline('fill-mask', model=pretrained_LM_path, tokenizer=tokenizer)
outputs = fill_mask(example)
print(outputs)
# see embeddings
inputs = tokenizer(example, return_tensors="pt")
outputs = model(**inputs)
print(outputs)
# OR you can use this model to train on your downstream task!
# please consider citing our paper if you feel this is useful :)
```
# Reference
- [PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter](XXX), LREC 2022.
# Citation
```bibtex
@inproceedings{kawintiranon2022polibertweet,
title = {PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter},
author = {Kawintiranon, Kornraphop and Singh, Lisa},
booktitle = {Proceedings of the Language Resources and Evaluation Conference},
year = {2022},
publisher = {European Language Resources Association}
}
```