File size: 1,643 Bytes
b9735c7
 
 
 
5c7283d
 
 
 
 
 
 
 
 
 
 
 
b9735c7
 
 
 
 
 
5c7283d
b9735c7
5c7283d
b9735c7
 
 
 
 
 
 
 
 
 
 
 
 
5c7283d
b9735c7
 
 
5c7283d
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
language: bn
datasets:
- wikiann
examples:
widget:
- text: "মারভিন দি মারসিয়ান"
  example_title: "Sentence_1"
- text: "লিওনার্দো দা ভিঞ্চি"
  example_title: "Sentence_2"
- text: "বসনিয়া ও হার্জেগোভিনা"
  example_title: "Sentence_3"
- text: "সাউথ ইস্ট ইউনিভার্সিটি"
  example_title: "Sentence_4"
- text: "মানিক বন্দ্যোপাধ্যায় লেখক"
  example_title: "Sentence_5"
---

<h1>Bengali Named Entity Recognition</h1>
Fine-tuning bert-base-multilingual-cased on Wikiann dataset for performing NER on Bengali language.


## Label ID and its corresponding label name

| Label ID | Label Name|
| -------- | ----- |
|0 | O |
| 1 | B-PER |
| 2 | I-PER |
| 3 | B-ORG|
| 4 | I-ORG | 
| 5 | B-LOC |
| 6 | I-LOC |

<h1>Results</h1>

| Name | Overall F1 | LOC F1 | ORG F1 | PER F1 |
| ---- | -------- | ----- | ---- | ---- |
| Train set | 0.997927 | 0.998246 | 0.996613 | 0.998769 |
| Validation set | 0.970187 | 0.969212 | 0.956831 | 0.982079 |
| Test set | 0.9673011 | 0.967120 |  0.963614 | 0.970938 |

Example
```py
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("Suchandra/bengali_language_NER")
model = AutoModelForTokenClassification.from_pretrained("Suchandra/bengali_language_NER")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "মারভিন দি মারসিয়ান"

ner_results = nlp(example)
ner_results
```