File size: 1,648 Bytes
d83ab1b
 
 
 
 
 
 
c0196e1
d83ab1b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75f727e
d83ab1b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38923a1
d83ab1b
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
language: bn
tags:
- bengali-ner
- bengali
- bangla
- NER
license: mit
datasets:
- wikiann
- xtreme
---

# Multi-lingual BERT Bengali Name Entity Recognition
`mBERT-Bengali-NER` is a transformer-based Bengali NER model build with [bert-base-multilingual-uncased](https://huggingface.co/bert-base-multilingual-uncased) model and [Wikiann](https://huggingface.co/datasets/wikiann) Datasets.

## How to Use

```py
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("sagorsarker/mbert-bengali-ner")
model = AutoModelForTokenClassification.from_pretrained("sagorsarker/mbert-bengali-ner")

nlp = pipeline("ner", model=model, tokenizer=tokenizer, grouped_entities=True)
example = "আমি জাহিদ এবং আমি ঢাকায় বাস করি।"

ner_results = nlp(example)
print(ner_results)
```

## Label and ID Mapping

| Label ID | Label |
| -------- | ----- |
|0 | O |
| 1 | B-PER |
| 2 | I-PER |
| 3 | B-ORG|
| 4 | I-ORG | 
| 5 | B-LOC |
| 6 | I-LOC |

## Training Details
- mBERT-Bengali-NER trained with [Wikiann](https://huggingface.co/datasets/wikiann) datasets
- mBERT-Bengali-NER trained with [transformers-token-classification](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/token_classification.ipynb) script
- mBERT-Bengali-NER total trained 5 epochs.
- Trained in Kaggle GPU

## Evaluation Results
|Model | F1 | Precision | Recall | Accuracy | Loss |
| ---- | --- | --------- | ----- | -------- | --- |
|mBert-Bengali-NER | 0.97105 | 0.96769| 0.97443 | 0.97682 | 0.12511 |