Commit
•
9d73213
1
Parent(s):
54d56e3
update README.md
Browse files
README.md
ADDED
@@ -0,0 +1,57 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: bn
|
3 |
+
datasets:
|
4 |
+
- wikiann
|
5 |
+
examples:
|
6 |
+
widget:
|
7 |
+
- text: "রিয়াল মাদ্রিদ ফুটবল ক্লাব"
|
8 |
+
example_title: "Sentence_1"
|
9 |
+
- text: "উত্তরবঙ্গ কৃষি বিশ্ববিদ্যালয়"
|
10 |
+
example_title: "Sentence_2"
|
11 |
+
- text: "বাংলাদেশ জাতীয় ক্রিকেট দল"
|
12 |
+
example_title: "Sentence_3"
|
13 |
+
- text: "বাংলাদেশ টেলিকমিউনিকেশন্স কোম্পানী লিমিটেড"
|
14 |
+
example_title: "Sentence_4"
|
15 |
+
- text: "রোমিও অ্যান্ড জুলিয়েট"
|
16 |
+
example_title: "Sentence_5"
|
17 |
+
---
|
18 |
+
|
19 |
+
<h1>Named Entity Recognition on Bangla Language</h1>
|
20 |
+
Fine Tuning BERT for NER on Bengali Language Tagging using HuggingFace
|
21 |
+
|
22 |
+
|
23 |
+
## Correspondence Label ID and Label Name
|
24 |
+
|
25 |
+
| Label ID | Label Name|
|
26 |
+
| -------- | ----- |
|
27 |
+
|0 | O |
|
28 |
+
| 1 | B-PER |
|
29 |
+
| 2 | I-PER |
|
30 |
+
| 3 | B-ORG|
|
31 |
+
| 4 | I-ORG |
|
32 |
+
| 5 | B-LOC |
|
33 |
+
| 6 | I-LOC |
|
34 |
+
|
35 |
+
<h1>Evaluation and Validation</h1>
|
36 |
+
|
37 |
+
| Name | Precision | Recall | F1 | Accuracy |
|
38 |
+
| ---- | -------- | ----- | ---- | ---- |
|
39 |
+
| Train/Val set | 0.963899 | 0.964770 | 0.964334 | 0.981252 |
|
40 |
+
| Test set | 0.952855 | 0.965105 | 0.958941 | 0.981349 |
|
41 |
+
|
42 |
+
|
43 |
+
Transformers AutoModelForTokenClassification
|
44 |
+
|
45 |
+
```py
|
46 |
+
from transformers import AutoTokenizer, AutoModelForTokenClassification
|
47 |
+
from transformers import pipeline
|
48 |
+
|
49 |
+
tokenizer = AutoTokenizer.from_pretrained("engineersakibcse47/NER_on_Bangla_Language")
|
50 |
+
model_ner = AutoModelForTokenClassification.from_pretrained("engineersakibcse47/NER_on_Bangla_Language")
|
51 |
+
|
52 |
+
pipe = pipeline("ner", model=model_ner, tokenizer=tokenizer)
|
53 |
+
sample = "বসনিয়া ও হার্জেগোভিনা"
|
54 |
+
|
55 |
+
result = pipe(sample)
|
56 |
+
result
|
57 |
+
```
|