kornosk
/

polibertweet-political-twitter-roberta-mlm

Fill-Mask Transformers PyTorch English roberta twitter masked-token-prediction bertweet election2020 politics Inference Endpoints

Model card Files Files and versions Community

polibertweet-political-twitter-roberta-mlm / README.md

kornosk's picture

Update README.md

7ebea6e almost 2 years ago

|

raw history blame contribute delete

No virus

2.16 kB

	---
	language: "en"
	tags:
	- twitter
	- masked-token-prediction
	- bertweet
	- election2020
	- politics
	license: "gpl-3.0"
	---

	# Pre-trained BERT on Twitter US Political Election 2020

	Pre-trained weights for PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter, LREC 2022.

	Please see the [official repository](https://github.com/GU-DataLab/PoliBERTweet) for more detail.

	We use the initialized weights from [BERTweet](https://huggingface.co/vinai/bertweet-base) or `vinai/bertweet-base`.

	# Training Data

	This model is pre-trained on over 83 million English tweets about the 2020 US Presidential Election.

	# Training Objective

	This model is initialized with BERTweet and trained with an MLM objective.

	# Usage

	This pre-trained language model can be fine-tunned to any downstream task (e.g. classification).

	```python
	from transformers import AutoModel, AutoTokenizer, pipeline
	import torch

	# choose GPU if available
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	# select mode path here
	pretrained_LM_path = "kornosk/polibertweet-mlm"

	# load model
	tokenizer = AutoTokenizer.from_pretrained(pretrained_LM_path)
	model = AutoModel.from_pretrained(pretrained_LM_path)

	# fill mask
	example = "Trump is the <mask> of USA"
	fill_mask = pipeline('fill-mask', model=pretrained_LM_path, tokenizer=tokenizer)

	outputs = fill_mask(example)
	print(outputs)

	# see embeddings
	inputs = tokenizer(example, return_tensors="pt")
	outputs = model(**inputs)
	print(outputs)

	# OR you can use this model to train on your downstream task!
	# please consider citing our paper if you feel this is useful :)
	```

	# Reference

	- [PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter](XXX), LREC 2022.

	# Citation
	```bibtex
	@inproceedings{kawintiranon2022polibertweet,
	title = {PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter},
	author = {Kawintiranon, Kornraphop and Singh, Lisa},
	booktitle = {Proceedings of the Language Resources and Evaluation Conference},
	year = {2022},
	publisher = {European Language Resources Association}
	}
	```