metadata
language: bn
datasets:
- wikiann
examples: null
widget:
- text: মারভিন দি মারসিয়ান
example_title: Sentence_1
- text: লিওনার্দো দা ভিঞ্চি
example_title: Sentence_2
- text: বসনিয়া ও হার্জেগোভিনা
example_title: Sentence_3
- text: সাউথ ইস্ট ইউনিভার্সিটি
example_title: Sentence_4
- text: মানিক বন্দ্যোপাধ্যায় লেখক
example_title: Sentence_5
Bengali Named Entity Recognition
Fine-tuning bert-base-multilingual-cased on Wikiann dataset for performing NER on Bengali language.Label ID and its corresponding label name
Label ID | Label Name |
---|---|
0 | O |
1 | B-PER |
2 | I-PER |
3 | B-ORG |
4 | I-ORG |
5 | B-LOC |
6 | I-LOC |
Results
Name | Overall F1 | LOC F1 | ORG F1 | PER F1 |
---|---|---|---|---|
Train set | 0.997927 | 0.998246 | 0.996613 | 0.998769 |
Validation set | 0.970187 | 0.969212 | 0.956831 | 0.982079 |
Test set | 0.9673011 | 0.967120 | 0.963614 | 0.970938 |
Example
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("Suchandra/bengali_language_NER")
model = AutoModelForTokenClassification.from_pretrained("Suchandra/bengali_language_NER")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "মারভিন দি মারসিয়ান"
ner_results = nlp(example)
ner_results