Edit model card

SlovakBERT based Named Entity Recognition

Deep Learning model developed for Named Entity Recognition (NER) in Slovak. The Gerulata/SlovakBERT based model is fine-tuned on webscraped Slovak news articles. The finished model supports the following IOB tagged entity categories: PERSON, ORGANIZATION, LOCATION, DATE, TIME, MONEY and PERCENTAGE

Related Work

Thesis

Model usage

Simple Named Entity Recognition (NER)

from transformers import pipeline

ner_pipeline = pipeline(task='ner', model='Raychani1/slovakbert-ner-v2')
input_sentence = 'Hoci podľa ostatných údajov NBS pre Bratislavský kraj je aktuálna priemerná cena nehnuteľností na úrovni 2 072 eur za štvorcový meter, ceny bytov v hlavnom meste sú podstatne vyššie.'
classifications = ner_pipeline(input_sentence)

Named Entity Recognition (NER) with Visualization

For a Visualization Example please refer to the following Gist.

Model Prediction Output Example

prediction_output

Model Training

Training Hyperparameters

Parameter Value
per_device_train_batch_size 4
per_device_eval_batch_size 4
learning_rate 5e-05
adam_beta1 0.9
adam_beta1 0.999
adam_epsilon 1e-08
num_train_epochs 15
lr_scheduler_type linear
seed 42

Training results

Best model results are reached in the 8th training epoch.

Training Loss Epoch Step Validation Loss Precision Recall F1 Accuracy
0.6721 1.0 70 0.2214 0.6972 0.7308 0.7136 0.9324
0.1849 2.0 140 0.1697 0.8056 0.8365 0.8208 0.952
0.0968 3.0 210 0.1213 0.882 0.8622 0.872 0.9728
0.0468 4.0 280 0.1107 0.8372 0.907 0.8708 0.9684
0.0415 5.0 350 0.1644 0.8059 0.8782 0.8405 0.9615
0.0233 6.0 420 0.1255 0.8576 0.8878 0.8724 0.9716
0.0198 7.0 490 0.1383 0.8545 0.8846 0.8693 0.9703
0.0133 8.0 560 0.1241 0.884 0.9038 0.8938 0.9735

Model Evaluation

Evaluation Dataset Distribution

NER Tag Number of Tokens
0 6568
B-Person 96
I-Person 83
B-Organizaton 583
I-Organizaton 585
B-Location 59
I-Location 15
B-Date 113
I-Date 87
Time 5
B-Money 44
I-Money 74
B-Percentage 57
I-Percentage 54

Evaluation Confusion Matrix

image

Evaluation Model Metrics

Precision Macro-Precision Recall Macro-Recall F1 Macro-F1 Accuracy
0.9897 0.9715 0.9897 0.9433 0.9895 0.9547 0.9897

Framework Versions

  • Transformers 4.26.1
  • PyTorch 1.13.1
  • Tokenizers 0.13.2
Downloads last month
14
Inference Examples
Inference API (serverless) has been turned off for this model.