🇧🇬 BERT - Bulgarian Named Entity Recognition

The model rmihaylov/bert-base-bg fine-tuned on a Bulgarian subset of wikiann. It achieves 0.99 F1-score on that dataset.


Import the libraries:

from pprint import pprint

import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline

Load the model:

MODEL_ID = "auhide/bert-base-ner-bulgarian"
model = AutoModelForTokenClassification.from_pretrained(MODEL_ID)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

ner = pipeline(task="ner", model=model, tokenizer=tokenizer)

Do inference:

text = "Философът Барух Спиноза е роден в Амстердам."
[{'end': 13,
  'entity': 'B-PER',
  'index': 3,
  'score': 0.9954899,
  'start': 9,
  'word': '▁Бар'},
 {'end': 15,
  'entity': 'I-PER',
  'index': 4,
  'score': 0.9660787,
  'start': 13,
  'word': 'ух'},
 {'end': 23,
  'entity': 'I-PER',
  'index': 5,
  'score': 0.99728084,
  'start': 15,
  'word': '▁Спиноза'},
 {'end': 43,
  'entity': 'B-LOC',
  'index': 9,
  'score': 0.8990479,
  'start': 33,
  'word': '▁Амстердам'}]

Note: There are three types of entities - PER, ORG, LOC.

