File size: 1,971 Bytes
9dca80e 7c99146 9dca80e 7c99146 62a8aa0 9dca80e 7c99146 9dca80e 25cf51c 7c99146 2692170 62a8aa0 a6ca4ed 62a8aa0 79ae76e 62a8aa0 9dca80e 47cb8d7 9dca80e b585723 25cf51c b585723 aa584f8 b585723 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
---
language:
- it
- en
- de
- fr
- es
- multilingual
license:
- mit
datasets:
- xtreme
metrics:
- precision: 0.874
- recall: 0.88
- f1: 0.877
- accuracy: 0.943
inference:
parameters:
aggregation_strategy: first
---
# gunghio/xlm-roberta-base-finetuned-panx-ner
This model was trained starting from xlm-roberta-base on a subset of xtreme dataset.
`xtreme` datasets subsets used are: PAN-X.{lang}. Language used for training/validation are: italian, english, german, french and spanish.
Only 75% of the whole dataset was used.
## Intended uses & limitations
Fine-tuned model can be used for Named Entity Recognition in it, en, de, fr, and es.
## Training and evaluation data
Training dataset: [xtreme](https://huggingface.co/datasets/xtreme)
### Training results
It achieves the following results on the evaluation set:
- Precision: 0.8744154472771157
- Recall: 0.8791424269015351
- F1: 0.8767725659462058
- Accuracy: 0.9432040948504613
Details:
| Label | Precision | Recall | F1-Score | Support |
|---------|-----------|--------|----------|---------|
| PER | 0.922 | 0.908 | 0.915 | 26639 |
| LOC | 0.880 | 0.906 | 0.892 | 37623 |
| ORG | 0.821 | 0.816 | 0.818 | 28045 |
| Overall | 0.874 | 0.879 | 0.877 | 92307 |
## Usage
Set aggregation stragey according to [documentation](https://huggingface.co/docs/transformers/v4.18.0/en/main_classes/pipelines#transformers.TokenClassificationPipeline).
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner")
model = AutoModelForTokenClassification.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner")
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="first")
example = "My name is Wolfgang and I live in Berlin"
ner_results = nlp(example)
print(ner_results)
```
|