File size: 2,953 Bytes
083e7bd
 
 
 
0a4cd71
083e7bd
0a4cd71
083e7bd
 
0a4cd71
083e7bd
 
0a4cd71
 
 
083e7bd
 
0a4cd71
 
9b1dfcc
083e7bd
81d93b7
 
 
0a4cd71
9ec8d6b
 
0a4cd71
 
 
 
 
 
 
 
083e7bd
7f8b301
083e7bd
 
 
 
 
 
 
7f8b301
59ecec4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
083e7bd
 
 
 
 
dd42f99
083e7bd
 
 
 
 
 
 
81d93b7
 
083e7bd
45ef805
083e7bd
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
tags:
- token-classification
language:
- fi
widget:
- text: Asun Brysselissä, Euroopan pääkaupungissa.
datasets:
- drvenabili/autotrain-data-turku-ner
- turku_ner_corpus
co2_eq_emissions:
  emissions: 0.2165403288824756
license: cc-by-sa-4.0

pipeline_tag: token-classification
---

# Info

This is a fine-tuned model on the NER task. The original model is Turku NLP's [bert-base-finnish-uncased-v1](https://huggingface.co/TurkuNLP/bert-base-finnish-uncased-v1), and the fine-tuning dataset is Turku NLP's [turku_ner_corpus](https://huggingface.co/datasets/turku_ner_corpus/). 

The model is released under CC-BY-SA 4.0.

Please mention the training dataset if you use this model:

```bibtex
@inproceedings{luoma-etal-2020-broad,
    title = "A Broad-coverage Corpus for {F}innish Named Entity Recognition",
    author = {Luoma, Jouni and Oinonen, Miika and Pyyk{\"o}nen, Maria and Laippala, Veronika and Pyysalo, Sampo},
    booktitle = "Proceedings of The 12th Language Resources and Evaluation Conference",
    year = "2020",
    url = "https://www.aclweb.org/anthology/2020.lrec-1.567",
    pages = "4615--4624",
}
```

# Validation Metrics

- Loss: 0.075
- Accuracy: 0.982
- Precision: 0.879
- Recall: 0.868
- F1: 0.873

# Test Metrics

### Overall Metrics

- Accuracy: 0.986
- Precision: 0.857
- Recall: 0.872
- F1: 0.864

### Per-entity metrics

```json
{
    "DATE": {
        "precision": 0.925,
        "recall": 0.9736842105263158,
        "f1": 0.9487179487179489,
        "number": "114"
    },
    "EVENT": {
        "precision": 0.3,
        "recall": 0.42857142857142855,
        "f1": 0.3529411764705882,
        "number": "7"
    },
    "LOC": {
        "precision": 0.9057239057239057,
        "recall": 0.9372822299651568,
        "f1": 0.9212328767123287,
        "number": "287"
    },
    "ORG": {
        "precision": 0.8274111675126904,
        "recall": 0.7836538461538461,
        "f1": 0.8049382716049382,
        "number": "208"
    },
    "PER": {
        "precision": 0.88,
        "recall": 0.9225806451612903,
        "f1": 0.9007874015748031,
        "number": "310"
    },
    "PRO": {
        "precision": 0.6081081081081081,
        "recall": 0.569620253164557,
        "f1": 0.5882352941176471,
        "number": "79"
    }
}
```

## Usage

You can use cURL to access this model:

```
$ curl -X POST -H "Authorization: Bearer YOUR_API_KEY" -H "Content-Type: application/json" -d '{"inputs": "I love AutoTrain"}' https://api-inference.huggingface.co/models/iguanodon-ai/autotrain-turku-ner-65992136346
```

Or Python API:

```
from transformers import AutoModelForTokenClassification, AutoTokenizer

model = AutoModelForTokenClassification.from_pretrained("iguanodon-ai/bert-base-finnish-uncased-ner")
tokenizer = AutoTokenizer.from_pretrained("iguanodon-ai/bert-base-finnish-uncased-ner")

inputs = tokenizer("Asun Brysselissä, Euroopan pääkaupungissa.", return_tensors="pt")
outputs = model(**inputs)
```