gunghio's picture
Update README.md
aa584f8
|
raw
history blame
1.75 kB
metadata
license:
  - mit
datasets:
  - xtreme
language:
  - it
  - en
  - de
  - fr
  - es
metrics:
  - precision: 0.874
  - recall: 0.88
  - f1: 0.877
  - accuracy: 0.943

gunghio/xlm-roberta-base-finetuned-panx-ner

This model was trained starting from xlm-roberta-base on a subset of xtreme dataset.

xtreme datasets subsets used are: PAN-X.{lang}. Language used for training/validation are: italian, english, german, french and spanish.

Only 75% of the whole dataset was used.

Intended uses & limitations

Fine-tuned model can be used for Named Entity Recognition in it, en, de, fr, and es.

Training and evaluation data

Training dataset: conll2003

Training results

It achieves the following results on the evaluation set:

  • Precision: 0.8744154472771157
  • Recall: 0.8791424269015351
  • F1: 0.8767725659462058
  • Accuracy: 0.9432040948504613

Details:

Label Precision Recall F1-Score Support
PER 0.922 0.908 0.915 26639
LOC 0.880 0.906 0.892 37623
ORG 0.821 0.816 0.818 28045
Overall 0.874 0.879 0.877 92307

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner")
model = AutoModelForTokenClassification.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner")

nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="first")
example = "My name is Wolfgang and I live in Berlin"

ner_results = nlp(example)
print(ner_results)