BERT-base-multilingual-cased finetuned for Part-of-Speech tagging

This is a multilingual BERT model fine tuned for part-of-speech tagging for English. It is trained using the Penn TreeBank (Marcus et al., 1993) and achieves an F1-score of 96.69.

Usage

A transformers pipeline can be used to run the model:

from transformers import AutoTokenizer, AutoModelForTokenClassification, TokenClassificationPipeline

model_name = "QCRI/bert-base-multilingual-cased-pos-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

pipeline = TokenClassificationPipeline(model=model, tokenizer=tokenizer)
outputs = pipeline("A test example")
print(outputs)

Citation

This model was used for all the part-of-speech tagging based results in Analyzing Encoded Concepts in Transformer Language Models, published at NAACL'22. If you find this model useful for your own work, please use the following citation:

@inproceedings{sajjad-NAACL,
  title={Analyzing Encoded Concepts in Transformer Language Models},
  author={Hassan Sajjad, Nadir Durrani, Fahim Dalvi, Firoj Alam, Abdul Rafae Khan and Jia Xu},
  booktitle={North American Chapter of the Association of Computational Linguistics: Human Language Technologies (NAACL)},
  series={NAACL~'22},
  year={2022},
  address={Seattle}
}
Downloads last month
4,177
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using QCRI/bert-base-multilingual-cased-pos-english 5