Edit model card

gunghio/xlm-roberta-base-finetuned-panx-ner

This model was trained starting from xlm-roberta-base on a subset of xtreme dataset.

xtreme datasets subsets used are: PAN-X.{lang}. Language used for training/validation are: italian, english, german, french and spanish.

Only 75% of the whole dataset was used.

Intended uses & limitations

Fine-tuned model can be used for Named Entity Recognition in it, en, de, fr, and es.

Training and evaluation data

Training dataset: xtreme

Training results

It achieves the following results on the evaluation set:

  • Precision: 0.8744154472771157
  • Recall: 0.8791424269015351
  • F1: 0.8767725659462058
  • Accuracy: 0.9432040948504613

Details:

Label Precision Recall F1-Score Support
PER 0.922 0.908 0.915 26639
LOC 0.880 0.906 0.892 37623
ORG 0.821 0.816 0.818 28045
Overall 0.874 0.879 0.877 92307

Usage

Set aggregation stragey according to documentation.

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner")
model = AutoModelForTokenClassification.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner")

nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="first")
example = "My name is Wolfgang and I live in Berlin"

ner_results = nlp(example)
print(ner_results)
Downloads last month
24
Safetensors
Model size
277M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train gunghio/xlm-roberta-base-finetuned-panx-ner