lbourdois's picture
Add multilingual to the language tag
10a4577
|
raw
history blame
1.97 kB
---
language:
- it
- en
- de
- fr
- es
- multilingual
license:
- mit
datasets:
- xtreme
metrics:
- precision: 0.874
- recall: 0.88
- f1: 0.877
- accuracy: 0.943
inference:
parameters:
aggregation_strategy: first
---
# gunghio/xlm-roberta-base-finetuned-panx-ner
This model was trained starting from xlm-roberta-base on a subset of xtreme dataset.
`xtreme` datasets subsets used are: PAN-X.{lang}. Language used for training/validation are: italian, english, german, french and spanish.
Only 75% of the whole dataset was used.
## Intended uses & limitations
Fine-tuned model can be used for Named Entity Recognition in it, en, de, fr, and es.
## Training and evaluation data
Training dataset: [xtreme](https://huggingface.co/datasets/xtreme)
### Training results
It achieves the following results on the evaluation set:
- Precision: 0.8744154472771157
- Recall: 0.8791424269015351
- F1: 0.8767725659462058
- Accuracy: 0.9432040948504613
Details:
| Label | Precision | Recall | F1-Score | Support |
|---------|-----------|--------|----------|---------|
| PER | 0.922 | 0.908 | 0.915 | 26639 |
| LOC | 0.880 | 0.906 | 0.892 | 37623 |
| ORG | 0.821 | 0.816 | 0.818 | 28045 |
| Overall | 0.874 | 0.879 | 0.877 | 92307 |
## Usage
Set aggregation stragey according to [documentation](https://huggingface.co/docs/transformers/v4.18.0/en/main_classes/pipelines#transformers.TokenClassificationPipeline).
```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner")
model = AutoModelForTokenClassification.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner")
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="first")
example = "My name is Wolfgang and I live in Berlin"
ner_results = nlp(example)
print(ner_results)
```