File size: 2,976 Bytes
083e7bd
 
 
 
0a4cd71
083e7bd
0a4cd71
083e7bd
 
0a4cd71
083e7bd
 
b8bd211
0a4cd71
 
083e7bd
 
0a4cd71
 
9b1dfcc
083e7bd
b8bd211
81d93b7
 
0a4cd71
9ec8d6b
 
0a4cd71
 
 
 
 
 
 
 
083e7bd
7f8b301
083e7bd
 
 
 
 
 
 
7f8b301
59ecec4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
083e7bd
 
 
 
 
b29768d
083e7bd
 
 
 
 
 
 
81d93b7
 
083e7bd
45ef805
083e7bd
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
tags:
- token-classification
language:
- fi
widget:
- text: Asun Brysselissä, Euroopan pääkaupungissa.
datasets:
- drvenabili/autotrain-data-turku-ner
- turku_ner_corpus
co2_eq_emissions:
  emissions: 0.2165403288824756
license: apache-2.0

pipeline_tag: token-classification
---

# Info

This is a fine-tuned model on the NER task. The original model is Turku NLP's [bert-base-finnish-uncased-v1](https://huggingface.co/TurkuNLP/bert-base-finnish-uncased-v1), and the fine-tuning dataset is Turku NLP's [turku_ner_corpus](https://huggingface.co/datasets/turku_ner_corpus/). 

The model is released under Apache 2.0.

Please mention the training dataset if you use this model:

```bibtex
@inproceedings{luoma-etal-2020-broad,
    title = "A Broad-coverage Corpus for {F}innish Named Entity Recognition",
    author = {Luoma, Jouni and Oinonen, Miika and Pyyk{\"o}nen, Maria and Laippala, Veronika and Pyysalo, Sampo},
    booktitle = "Proceedings of The 12th Language Resources and Evaluation Conference",
    year = "2020",
    url = "https://www.aclweb.org/anthology/2020.lrec-1.567",
    pages = "4615--4624",
}
```

# Validation Metrics

- Loss: 0.075
- Accuracy: 0.982
- Precision: 0.879
- Recall: 0.868
- F1: 0.873

# Test Metrics

### Overall Metrics

- Accuracy: 0.986
- Precision: 0.857
- Recall: 0.872
- F1: 0.864

### Per-entity metrics

```json
{
    "DATE": {
        "precision": 0.925,
        "recall": 0.9736842105263158,
        "f1": 0.9487179487179489,
        "number": "114"
    },
    "EVENT": {
        "precision": 0.3,
        "recall": 0.42857142857142855,
        "f1": 0.3529411764705882,
        "number": "7"
    },
    "LOC": {
        "precision": 0.9057239057239057,
        "recall": 0.9372822299651568,
        "f1": 0.9212328767123287,
        "number": "287"
    },
    "ORG": {
        "precision": 0.8274111675126904,
        "recall": 0.7836538461538461,
        "f1": 0.8049382716049382,
        "number": "208"
    },
    "PER": {
        "precision": 0.88,
        "recall": 0.9225806451612903,
        "f1": 0.9007874015748031,
        "number": "310"
    },
    "PRO": {
        "precision": 0.6081081081081081,
        "recall": 0.569620253164557,
        "f1": 0.5882352941176471,
        "number": "79"
    }
}
```

## Usage

You can use cURL to access this model:

```
$ curl -X POST -H "Authorization: Bearer YOUR_API_KEY" -H "Content-Type: application/json" -d '{"inputs": "Asun Brysselissä, Euroopan pääkaupungissa."}' https://api-inference.huggingface.co/models/iguanodon-ai/bert-base-finnish-uncased-ner
```

Or Python API:

```
from transformers import AutoModelForTokenClassification, AutoTokenizer

model = AutoModelForTokenClassification.from_pretrained("iguanodon-ai/bert-base-finnish-uncased-ner")
tokenizer = AutoTokenizer.from_pretrained("iguanodon-ai/bert-base-finnish-uncased-ner")

inputs = tokenizer("Asun Brysselissä, Euroopan pääkaupungissa.", return_tensors="pt")
outputs = model(**inputs)
```