|
--- |
|
license: |
|
- mit |
|
datasets: |
|
- xtreme |
|
language: |
|
- it |
|
- en |
|
- de |
|
- fr |
|
- es |
|
metrics: |
|
- precision: 0.874 |
|
- recall: 0.880 |
|
- f1: 0.877 |
|
- accuracy: 0.943 |
|
--- |
|
|
|
# gunghio/xlm-roberta-base-finetuned-panx-ner |
|
|
|
This model was trained starting from xlm-roberta-base on a subset of xtreme dataset. |
|
|
|
`xtreme` datasets subsets used are: PAN-X.{lang}. Language used for training/validation are: italian, english, german, french and spanish. |
|
|
|
Only 75% of the whole dataset was used. |
|
|
|
## Intended uses & limitations |
|
|
|
Fine-tuned model can be used for Named Entity Recognition in it, en, de, fr, and es. |
|
|
|
## Training and evaluation data |
|
|
|
Training dataset: [conll2003](https://huggingface.co/datasets/xtreme) |
|
|
|
### Training results |
|
|
|
It achieves the following results on the evaluation set: |
|
|
|
- Precision: 0.8744154472771157 |
|
- Recall: 0.8791424269015351 |
|
- F1: 0.8767725659462058 |
|
- Accuracy: 0.9432040948504613 |
|
|
|
Details: |
|
|
|
| Label | Precision | Recall | F1-Score | Support | |
|
|---------|-----------|--------|----------|---------| |
|
| PER | 0.922 | 0.908 | 0.915 | 26639 | |
|
| LOC | 0.880 | 0.906 | 0.892 | 37623 | |
|
| ORG | 0.821 | 0.816 | 0.818 | 28045 | |
|
| Overall | 0.874 | 0.879 | 0.877 | 92307 | |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForTokenClassification |
|
from transformers import pipeline |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner") |
|
model = AutoModelForTokenClassification.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner") |
|
|
|
nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="first") |
|
example = "My name is Wolfgang and I live in Berlin" |
|
|
|
ner_results = nlp(example) |
|
print(ner_results) |
|
``` |
|
|