TweebankNLP
/

bertweet-tb2_ewt-pos-tagging

Token Classification

Model card Files Files and versions Community

Model Specification

This is the state-of-the-art Twitter POS tagging model (with 95.38% Accuracy) on Tweebank V2's NER benchmark (also called Tweebank-NER), trained on the corpus combining both Tweebank-NER and English-EWT training data.
For more details about the TweebankNLP project, please refer to this our paper and github page.
In the paper, it is referred as HuggingFace-BERTweet (TB2+EWT) in the POS table.

How to use the model

PRE-PROCESSING: when you apply the model on tweets, please make sure that tweets are preprocessed by the TweetTokenizer to get the best performance.

from transformers import AutoTokenizer, AutoModelForTokenClassification

tokenizer = AutoTokenizer.from_pretrained("TweebankNLP/bertweet-tb2_ewt-pos-tagging")

model = AutoModelForTokenClassification.from_pretrained("TweebankNLP/bertweet-tb2_ewt-pos-tagging")

References

If you use this repository in your research, please kindly cite our paper:

@article{jiang2022tweetnlp,
    title={Annotating the Tweebank Corpus on Named Entity Recognition and Building NLP Models for Social Media Analysis},
    author={Jiang, Hang and Hua, Yining and Beeferman, Doug and Roy, Deb},
    journal={In Proceedings of the 13th Language Resources and Evaluation Conference (LREC)},
    year={2022}
}

Downloads last month: 41,317

Inference Providers NEW

Token Classification

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support