--- language: - it - en - de - fr - es - multilingual license: - mit datasets: - xtreme metrics: - precision: 0.874 - recall: 0.88 - f1: 0.877 - accuracy: 0.943 inference: parameters: aggregation_strategy: first --- # gunghio/xlm-roberta-base-finetuned-panx-ner This model was trained starting from xlm-roberta-base on a subset of xtreme dataset. `xtreme` datasets subsets used are: PAN-X.{lang}. Language used for training/validation are: italian, english, german, french and spanish. Only 75% of the whole dataset was used. ## Intended uses & limitations Fine-tuned model can be used for Named Entity Recognition in it, en, de, fr, and es. ## Training and evaluation data Training dataset: [xtreme](https://huggingface.co/datasets/xtreme) ### Training results It achieves the following results on the evaluation set: - Precision: 0.8744154472771157 - Recall: 0.8791424269015351 - F1: 0.8767725659462058 - Accuracy: 0.9432040948504613 Details: | Label | Precision | Recall | F1-Score | Support | |---------|-----------|--------|----------|---------| | PER | 0.922 | 0.908 | 0.915 | 26639 | | LOC | 0.880 | 0.906 | 0.892 | 37623 | | ORG | 0.821 | 0.816 | 0.818 | 28045 | | Overall | 0.874 | 0.879 | 0.877 | 92307 | ## Usage Set aggregation stragey according to [documentation](https://huggingface.co/docs/transformers/v4.18.0/en/main_classes/pipelines#transformers.TokenClassificationPipeline). ```python from transformers import AutoTokenizer, AutoModelForTokenClassification from transformers import pipeline tokenizer = AutoTokenizer.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner") model = AutoModelForTokenClassification.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner") nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="first") example = "My name is Wolfgang and I live in Berlin" ner_results = nlp(example) print(ner_results) ```