File size: 1,582 Bytes
0841f26 bf20724 0841f26 bf20724 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
---
inference: true
license: cc-by-4.0
datasets:
- wikiann
language:
- bg
metrics:
- f1
pipeline_tag: text-classification
widget:
- text: 'Философът Барух Спиноза е роден в Амстердам.'
---
# 🇧🇬 BERT - Bulgarian Named Entity Recognition
The model [rmihaylov/bert-base-bg](https://huggingface.co/rmihaylov/bert-base-bg) fine-tuned on a Bulgarian subset of [wikiann](https://huggingface.co/datasets/wikiann).
It achieves *0.99* F1-score on that dataset.
## Usage
Import the libraries:
```python
from pprint import pprint
import torch
from transformers import AutoModelForTokenClassification, AutoTokenizer, pipeline
```
Load the model:
```python
MODEL_ID = "auhide/bert-base-ner-bulgarian"
model = AutoModelForTokenClassification.from_pretrained(MODEL_ID)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
ner = pipeline(task="ner", model=model, tokenizer=tokenizer)
```
Do inference:
```python
text = "Философът Барух Спиноза е роден в Амстердам."
pprint(ner(text))
```
```sh
[{'end': 13,
'entity': 'B-PER',
'index': 3,
'score': 0.9954899,
'start': 9,
'word': '▁Бар'},
{'end': 15,
'entity': 'I-PER',
'index': 4,
'score': 0.9660787,
'start': 13,
'word': 'ух'},
{'end': 23,
'entity': 'I-PER',
'index': 5,
'score': 0.99728084,
'start': 15,
'word': '▁Спиноза'},
{'end': 43,
'entity': 'B-LOC',
'index': 9,
'score': 0.8990479,
'start': 33,
'word': '▁Амстердам'}]
```
Note: There are three types of entities - `PER`, `ORG`, `LOC`. |