File size: 1,468 Bytes
9da215e
530d225
 
 
 
 
9da215e
 
530d225
0b39bec
 
 
 
 
 
 
 
 
 
 
 
 
 
997ca3f
0b39bec
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
language:
  - en
tags:
  - part-of-speech
  - finetuned
license: cc-by-nc-3.0
---

# BERT-base-multilingual-cased finetuned for Part-of-Speech tagging

This is a multilingual BERT model fine tuned for part-of-speech tagging for English. It is trained using the Penn TreeBank (Marcus et al., 1993) and achieves an F1-score of 96.69.

## Usage
A *transformers* pipeline can be used to run the model:

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification, TokenClassificationPipeline

model_name = "QCRI/bert-base-multilingual-cased-pos-english"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

pipeline = TokenClassificationPipeline(model=model, tokenizer=tokenizer)
outputs = pipeline("A test example")
print(outputs)
```


## Citation
This model was used for all the part-of-speech tagging based results in *Analyzing Encoded Concepts in Transformer Language Models*, published at NAACL'22. If you find this model useful for your own work, please use the following citation:

```bib
@inproceedings{sajjad-NAACL,
  title={Analyzing Encoded Concepts in Transformer Language Models},
  author={Hassan Sajjad, Nadir Durrani, Fahim Dalvi, Firoj Alam, Abdul Rafae Khan and Jia Xu},
  booktitle={North American Chapter of the Association of Computational Linguistics: Human Language Technologies (NAACL)},
  series={NAACL~'22},
  year={2022},
  address={Seattle}
}
```