|
--- |
|
tags: |
|
- token-classification |
|
language: |
|
- fi |
|
widget: |
|
- text: Asun Brysselissä, Euroopan pääkaupungissa. |
|
datasets: |
|
- drvenabili/autotrain-data-turku-ner |
|
- turku_ner_corpus |
|
co2_eq_emissions: |
|
emissions: 0.2165403288824756 |
|
license: apache-2.0 |
|
|
|
pipeline_tag: token-classification |
|
--- |
|
|
|
# Info |
|
|
|
This is a fine-tuned model on the NER task. The original model is Turku NLP's [bert-base-finnish-uncased-v1](https://huggingface.co/TurkuNLP/bert-base-finnish-uncased-v1), and the fine-tuning dataset is Turku NLP's [turku_ner_corpus](https://huggingface.co/datasets/turku_ner_corpus/). |
|
|
|
The model is released under Apache 2.0. |
|
|
|
Please mention the training dataset if you use this model: |
|
|
|
```bibtex |
|
@inproceedings{luoma-etal-2020-broad, |
|
title = "A Broad-coverage Corpus for {F}innish Named Entity Recognition", |
|
author = {Luoma, Jouni and Oinonen, Miika and Pyyk{\"o}nen, Maria and Laippala, Veronika and Pyysalo, Sampo}, |
|
booktitle = "Proceedings of The 12th Language Resources and Evaluation Conference", |
|
year = "2020", |
|
url = "https://www.aclweb.org/anthology/2020.lrec-1.567", |
|
pages = "4615--4624", |
|
} |
|
``` |
|
|
|
# Validation Metrics |
|
|
|
- Loss: 0.075 |
|
- Accuracy: 0.982 |
|
- Precision: 0.879 |
|
- Recall: 0.868 |
|
- F1: 0.873 |
|
|
|
# Test Metrics |
|
|
|
### Overall Metrics |
|
|
|
- Accuracy: 0.986 |
|
- Precision: 0.857 |
|
- Recall: 0.872 |
|
- F1: 0.864 |
|
|
|
### Per-entity metrics |
|
|
|
```json |
|
{ |
|
"DATE": { |
|
"precision": 0.925, |
|
"recall": 0.9736842105263158, |
|
"f1": 0.9487179487179489, |
|
"number": "114" |
|
}, |
|
"EVENT": { |
|
"precision": 0.3, |
|
"recall": 0.42857142857142855, |
|
"f1": 0.3529411764705882, |
|
"number": "7" |
|
}, |
|
"LOC": { |
|
"precision": 0.9057239057239057, |
|
"recall": 0.9372822299651568, |
|
"f1": 0.9212328767123287, |
|
"number": "287" |
|
}, |
|
"ORG": { |
|
"precision": 0.8274111675126904, |
|
"recall": 0.7836538461538461, |
|
"f1": 0.8049382716049382, |
|
"number": "208" |
|
}, |
|
"PER": { |
|
"precision": 0.88, |
|
"recall": 0.9225806451612903, |
|
"f1": 0.9007874015748031, |
|
"number": "310" |
|
}, |
|
"PRO": { |
|
"precision": 0.6081081081081081, |
|
"recall": 0.569620253164557, |
|
"f1": 0.5882352941176471, |
|
"number": "79" |
|
} |
|
} |
|
``` |
|
|
|
## Usage |
|
|
|
You can use cURL to access this model: |
|
|
|
``` |
|
$ curl -X POST -H "Authorization: Bearer YOUR_API_KEY" -H "Content-Type: application/json" -d '{"inputs": "Asun Brysselissä, Euroopan pääkaupungissa."}' https://api-inference.huggingface.co/models/iguanodon-ai/bert-base-finnish-uncased-ner |
|
``` |
|
|
|
Or Python API: |
|
|
|
``` |
|
from transformers import AutoModelForTokenClassification, AutoTokenizer |
|
|
|
model = AutoModelForTokenClassification.from_pretrained("iguanodon-ai/bert-base-finnish-uncased-ner") |
|
tokenizer = AutoTokenizer.from_pretrained("iguanodon-ai/bert-base-finnish-uncased-ner") |
|
|
|
inputs = tokenizer("Asun Brysselissä, Euroopan pääkaupungissa.", return_tensors="pt") |
|
outputs = model(**inputs) |
|
``` |