|
--- |
|
language: bn |
|
tags: |
|
- bengali-ner |
|
- bengali |
|
- bangla |
|
- NER |
|
license: mit |
|
datasets: |
|
- wikiann |
|
- xtreme |
|
--- |
|
|
|
# Multi-lingual BERT Bengali Name Entity Recognition |
|
`mBERT-Bengali-NER` is a transformer-based Bengali NER model build with [bert-base-multilingual-uncased](https://huggingface.co/bert-base-multilingual-uncased) model and [Wikiann](https://huggingface.co/datasets/wikiann) Datasets. |
|
|
|
## How to Use |
|
|
|
```py |
|
from transformers import AutoTokenizer, AutoModelForTokenClassification |
|
from transformers import pipeline |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("sagorsarker/mbert-bengali-ner") |
|
model = AutoModelForTokenClassification.from_pretrained("sagorsarker/mbert-bengali-ner") |
|
|
|
nlp = pipeline("ner", model=model, tokenizer=tokenizer, grouped_entities=True) |
|
example = "আমি জাহিদ এবং আমি ঢাকায় বাস করি।" |
|
|
|
ner_results = nlp(example) |
|
print(ner_results) |
|
``` |
|
|
|
## Label and ID Mapping |
|
|
|
| Label ID | Label | |
|
| -------- | ----- | |
|
|0 | O | |
|
| 1 | B-PER | |
|
| 2 | I-PER | |
|
| 3 | B-ORG| |
|
| 4 | I-ORG | |
|
| 5 | B-LOC | |
|
| 6 | I-LOC | |
|
|
|
## Training Details |
|
- mBERT-Bengali-NER trained with [Wikiann](https://huggingface.co/datasets/wikiann) datasets |
|
- mBERT-Bengali-NER trained with [transformers-token-classification](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/token_classification.ipynb) script |
|
- mBERT-Bengali-NER total trained 5 epochs. |
|
- Trained in Kaggle GPU |
|
|
|
## Evaluation Results |
|
|Model | F1 | Precision | Recall | Accuracy | Loss | |
|
| ---- | --- | --------- | ----- | -------- | --- | |
|
|mBert-Bengali-NER | 0.97105 | 0.96769| 0.97443 | 0.97682 | 0.12511 | |
|
|
|
|
|
|
|
|