File size: 4,597 Bytes
1392036
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
language: is
license: apache-2.0
widget:
 - text: "Kristin manneskja getur ekki lagt frásagnir af Jesú Kristi á hilluna vegna þess að hún sé búin að lesa þær ."
 - text: "Til hvers að kjósa flokk , sem þykist vera Jafnaðarmannaflokkur rétt fyrir kosningar , þegar að það er hægt að kjósa sannnan jafnaðarmannaflokk , sjálfan Jafnaðarmannaflokk Íslands - Samfylkinguna ."
 - text: "Það sannaðist svo eftirminnilega á plötunni Það þarf fólk eins og þig sem kom út fyrir þremur árum , en á henni hann Fálka úr Keflavík og Gáluna , son sinn , til að útsetja lög hans og spila inn ."
 - text: "Lögin hafa áður komið út sem aukalög á smáskífum af Hail to the Thief , en á disknum er líka myndband og fleira efni fyrir tölvur ."
 - text: "Britney gerði honum viðvart og hann ók henni á UCLA-sjúkrahúsið í Santa Monica en það er í nágrenni hljóðversins ."
---


# IcelandicNER BERT

This repo consists of pretrained models that were fine-tuned on the MIM-GOLD-NER dataset for the Icelandic language. 
The [MIM-GOLD-NER](http://hdl.handle.net/20.500.12537/42) corpus was developed at [Reykjavik University](https://en.ru.is/) in 2018–2020 that covered eight types of entities:

- Date
- Location
- Miscellaneous 
- Money
- Organization
- Percent
- Person
- Time 

## Dataset Information

|       |   Records |   B-Date |   B-Location |   B-Miscellaneous |   B-Money |   B-Organization |   B-Percent |   B-Person |   B-Time |   I-Date |   I-Location |   I-Miscellaneous |   I-Money |   I-Organization |   I-Percent |   I-Person |   I-Time |
|:------|----------:|---------:|-------------:|------------------:|----------:|-----------------:|------------:|-----------:|---------:|---------:|-------------:|------------------:|----------:|-----------------:|------------:|-----------:|---------:|
| Train |     39988 |     3409 |         5980 |              4351 |       729 |             5754 |         502 |      11719 |      868 |     2112 |          516 |              3036 |       770 |             2382 |          50 |       5478 |      790 |
| Valid |      7063 |      570 |         1034 |               787 |       100 |             1078 |         103 |       2106 |      147 |      409 |           76 |               560 |       104 |              458 |           7 |        998 |      136 |
| Test  |      8299 |      779 |         1319 |               935 |       153 |             1315 |         108 |       2247 |      172 |      483 |          104 |               660 |       167 |              617 |          10 |       1089 |      158 |


## Evaluation

The following tables summarize the scores obtained by model overall and per each class.

|     entity    | precision |  recall  | f1-score | support |
|:-------------:|:---------:|:--------:|:--------:|:-------:|
|      Date     |  0.969466 | 0.978177 | 0.973802 |  779.0  |
|    Location   |  0.955201 | 0.953753 | 0.954476 |  1319.0 |
| Miscellaneous |  0.867033 | 0.843850 | 0.855285 |  935.0  |
|     Money     |  0.979730 | 0.947712 | 0.963455 |  153.0  |
|  Organization |  0.893939 | 0.897338 | 0.895636 |  1315.0 |
|    Percent    |  1.000000 | 1.000000 | 1.000000 |  108.0  |
|     Person    |  0.963028 | 0.973743 | 0.968356 |  2247.0 |
|      Time     |  0.976879 | 0.982558 | 0.979710 |  172.0  |
|   micro avg   |  0.938158 | 0.938958 | 0.938558 |  7028.0 |
|   macro avg   |  0.950659 | 0.947141 | 0.948840 |  7028.0 |
|  weighted avg |  0.937845 | 0.938958 | 0.938363 |  7028.0 |


## How To Use
You use this model with Transformers pipeline for NER.

### Installing requirements

```bash
pip install transformers
```

### How to predict using pipeline

```python
from transformers import AutoTokenizer
from transformers import AutoModelForTokenClassification  # for pytorch
from transformers import TFAutoModelForTokenClassification  # for tensorflow
from transformers import pipeline


model_name_or_path = "m3hrdadfi/icelandic-ner-bert" 
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModelForTokenClassification.from_pretrained(model_name_or_path)  # Pytorch
# model = TFAutoModelForTokenClassification.from_pretrained(model_name_or_path)  # Tensorflow

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "Kristin manneskja getur ekki lagt frásagnir af Jesú Kristi á hilluna vegna þess að hún sé búin að lesa þær ."

ner_results = nlp(example)
print(ner_results)
```


## Questions?
Post a Github issue on the [IcelandicNER Issues](https://github.com/m3hrdadfi/icelandic-ner/issues) repo.