File size: 1,971 Bytes
9dca80e
7c99146
 
 
 
 
 
 
 
9dca80e
7c99146
 
62a8aa0
9dca80e
7c99146
9dca80e
 
25cf51c
 
7c99146
2692170
62a8aa0
 
 
 
 
a6ca4ed
62a8aa0
 
 
 
 
 
 
 
 
79ae76e
62a8aa0
 
 
 
 
9dca80e
 
 
 
 
 
 
 
 
 
 
 
47cb8d7
9dca80e
b585723
 
25cf51c
 
b585723
 
 
 
 
 
 
aa584f8
b585723
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
language:
- it
- en
- de
- fr
- es
- multilingual
license:
- mit
datasets:
- xtreme
metrics:
- precision: 0.874
- recall: 0.88
- f1: 0.877
- accuracy: 0.943
inference:
  parameters:
    aggregation_strategy: first
---

# gunghio/xlm-roberta-base-finetuned-panx-ner

This model was trained starting from xlm-roberta-base on a subset of xtreme dataset.

`xtreme` datasets subsets used are: PAN-X.{lang}. Language used for training/validation are: italian, english, german, french and spanish.

Only 75% of the whole dataset was used.

## Intended uses & limitations

Fine-tuned model can be used for Named Entity Recognition in it, en, de, fr, and es.

## Training and evaluation data

Training dataset: [xtreme](https://huggingface.co/datasets/xtreme)

### Training results

It achieves the following results on the evaluation set:

- Precision: 0.8744154472771157
- Recall: 0.8791424269015351
- F1: 0.8767725659462058
- Accuracy: 0.9432040948504613

Details:

| Label   | Precision | Recall | F1-Score | Support |
|---------|-----------|--------|----------|---------|
| PER     | 0.922     | 0.908  | 0.915    | 26639   |
| LOC     | 0.880     | 0.906  | 0.892    | 37623   |
| ORG     | 0.821     | 0.816  | 0.818    | 28045   |
| Overall | 0.874     | 0.879  | 0.877    | 92307   |

## Usage

Set aggregation stragey according to [documentation](https://huggingface.co/docs/transformers/v4.18.0/en/main_classes/pipelines#transformers.TokenClassificationPipeline).

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner")
model = AutoModelForTokenClassification.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner")

nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="first")
example = "My name is Wolfgang and I live in Berlin"

ner_results = nlp(example)
print(ner_results)
```