File size: 2,686 Bytes
1ef7f42
 
 
 
 
 
 
 
 
 
 
d0ceb75
1ef7f42
d0ceb75
1ef7f42
 
 
 
 
 
 
 
 
 
 
 
f3a3de4
 
 
 
 
 
1ef7f42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---

tags:
- token-classification
datasets:
- conll2012_ontonotesv5
widget:
- text: "On September 1st George won 1 dollar while watching Game of Thrones."

---

# (NER) roberta-base : conll2012_ontonotesv5-english-v4

This `roberta-base` NER model was finetuned on `conll2012_ontonotesv5` version `english-v4` dataset.


## Dataset
- conll2012_ontonotesv5
    - Language : English
    - Version : v4

  | Dataset | Examples |
  | --- | --- | 
  | Training | 75187 | 
  | Testing | 9479 |

## Evaluation

- Precision: 88.88
- Recall: 90.69
- F1-Score: 89.78


```
                precision    recall  f1-score   support

    CARDINAL       0.84      0.85      0.85       935
        DATE       0.85      0.90      0.87      1602
       EVENT       0.67      0.76      0.71        63
         FAC       0.74      0.72      0.73       135
         GPE       0.97      0.96      0.96      2240
    LANGUAGE       0.83      0.68      0.75        22
         LAW       0.66      0.62      0.64        40
         LOC       0.74      0.80      0.77       179
       MONEY       0.85      0.89      0.87       314
        NORP       0.93      0.96      0.95       841
     ORDINAL       0.81      0.89      0.85       195
         ORG       0.90      0.91      0.91      1795
     PERCENT       0.90      0.92      0.91       349
      PERSON       0.95      0.95      0.95      1988
     PRODUCT       0.74      0.83      0.78        76
    QUANTITY       0.76      0.80      0.78       105
        TIME       0.62      0.67      0.65       212
 WORK_OF_ART       0.58      0.69      0.63       166

   micro avg       0.89      0.91      0.90     11257
   macro avg       0.80      0.82      0.81     11257
weighted avg       0.89      0.91      0.90     11257
```

## Usage

```
from transformers import pipeline

ner_pipeline = pipeline(
    'token-classification', 
    model=r'models/roberta-base_1656662418.0944197/checkpoint-14100',
    aggregation_strategy='simple'
)
```
TEST 1
```
ner_pipeline("India is a beautiful country")
```

```
# Output
[{'entity_group': 'GPE',
  'score': 0.99186057,
  'word': ' India',
  'start': 0,
  'end': 5}]
```

TEST 2

```
ner_pipeline("On September 1st George won 1 dollar while watching Game of Thrones.")
```

```
# Output
[{'entity_group': 'DATE',
  'score': 0.99720246,
  'word': ' September 1st',
  'start': 3,
  'end': 16},
 {'entity_group': 'PERSON',
  'score': 0.99071586,
  'word': ' George',
  'start': 17,
  'end': 23},
 {'entity_group': 'MONEY',
  'score': 0.9872978,
  'word': ' 1 dollar',
  'start': 28,
  'end': 36},
 {'entity_group': 'WORK_OF_ART',
  'score': 0.9946732,
  'word': ' Game of Thrones',
  'start': 52,
  'end': 67}]
```