File size: 889 Bytes
8aeaffd
9ce0e65
 
8aeaffd
 
 
9ce0e65
8aeaffd
 
cdbc9a9
8aeaffd
 
 
 
c2e72bb
 
 
 
8f5a7f3
c2e72bb
 
 
b9efcfa
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
---
model:
- KB/bert-base-swedish-cased
tags:
- token-classification
- sequence-tagger-model
- bert
language: sv
datasets:
- KBLab/sucx3_ner
widget:
- text: "Emil bor i Lönneberga"
---

# KB-BERT for NER

## Mixed cased and uncased data

This model is based on [KB-BERT](https://huggingface.co/KB/bert-base-swedish-cased) and was fine-tuned on the [SUCX 3.0 - NER](https://huggingface.co/datasets/KBLab/sucx3_ner) corpus, using the _simple_ tags and partially lowercased data.
For this model we used a variation of the data that did **not** use BIO-encoding to differentiate between the beginnings (B), and insides (I) of named entity tags.

The model was trained on the training data only, with the best model chosen by its performance on the validation data.
You find more information about the model and the performance on our blog: https://kb-labb.github.io/posts/2022-02-07-sucx3_ner