saattrupdan's picture
Update README.md
880fb22
metadata
language:
  - da
  - 'no'
  - nb
  - nn
  - sv
  - fo
  - is
license: mit
datasets:
  - dane
  - norne
  - wikiann
  - suc3.0
model-index:
  - name: nbailab-base-ner-scandi
    results:
      - task:
          type: token-classification
          name: Token Classification
widget:
  - >-
    Hans er en professor på Københavns Universitetet i København, og han er en
    rigtig københavner. Hans kat, altså Hans' kat, Lisa, er supersød. Han fik
    købt en Mona Lisa på tilbud i Netto og gav den til hans kat, og nu er Mona
    Lisa'en Lisa's kæreste eje. Hans er med hans bror Peter, og de besluttede,
    at Peterskirken skulle have fint besøg af Peter og hans ven Hans. Men nu har
    de begge Corona.

ScandiNER - Named Entity Recognition model for Scandinavia

This model is a fine-tuned version of NbAiLab/nb-bert-base for Named Entity Recognition for Danish, Norwegian (both Bokmål and Nynorsk), Swedish, Icelandic and Faroese. It has been fine-tuned on the concatenation of DaNE, NorNE, SUC 3.0 and the Icelandic and Faroese parts of the WikiANN dataset.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 90135.90000000001
  • num_epochs: 1000

Training results

Training Loss Epoch Step Validation Loss Micro F1 Micro F1 No Misc
0.6682 1.0 2816 0.0872 0.6916 0.7306
0.0684 2.0 5632 0.0464 0.8167 0.8538
0.0444 3.0 8448 0.0367 0.8485 0.8783
0.0349 4.0 11264 0.0316 0.8684 0.8920
0.0282 5.0 14080 0.0290 0.8820 0.9033
0.0231 6.0 16896 0.0283 0.8854 0.9060
0.0189 7.0 19712 0.0253 0.8964 0.9156
0.0155 8.0 22528 0.0260 0.9016 0.9201
0.0123 9.0 25344 0.0266 0.9059 0.9233
0.0098 10.0 28160 0.0280 0.9091 0.9279
0.008 11.0 30976 0.0309 0.9093 0.9287
0.0065 12.0 33792 0.0313 0.9103 0.9284
0.0053 13.0 36608 0.0322 0.9078 0.9257
0.0046 14.0 39424 0.0343 0.9075 0.9256

Framework versions

  • Transformers 4.10.3
  • Pytorch 1.9.0+cu102
  • Datasets 1.12.1
  • Tokenizers 0.10.3