File size: 1,930 Bytes
bccc26c
e74ce20
 
bccc26c
 
 
 
 
 
 
 
 
 
 
e74ce20
 
bccc26c
e74ce20
bccc26c
e74ce20
bccc26c
e74ce20
bccc26c
e74ce20
bccc26c
 
 
 
 
 
 
 
 
e74ce20
bccc26c
e74ce20
bccc26c
 
 
 
 
 
 
 
e74ce20
bccc26c
 
e74ce20
bccc26c
 
 
 
 
e74ce20
 
bccc26c
 
 
 
 
 
 
e425069
bccc26c
 
 
 
 
 
 
 
 
 
 
 
 
407371c
bccc26c
407371c
bccc26c
407371c
bccc26c
407371c
bccc26c
407371c
e74ce20
 
bccc26c
e74ce20
bccc26c
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96

---
language: bn
tags:
- collaborative
- bengali
- NER
license: apache-2.0
datasets: xtreme 
metrics:
- Loss
- Accuracy
- Precision
- Recall
---

# sahajBERT Named Entity Recognition

## Model description

[sahajBERT](https://huggingface.co/neuropark/sahajBERT-NER) fine-tuned for NER using the bengali split of [WikiANN ](https://huggingface.co/datasets/wikiann). 

Named Entities predicted by the model:

| Label id | Label |
|:--------:|:----:|
|0 |O|
|1 |B-PER|
|2 |I-PER|
|3 |B-ORG|
|4 |I-ORG|
|5 |B-LOC|
|6 |I-LOC|

## Intended uses & limitations

#### How to use

You can use this model directly with a pipeline for token classification:
```python
from transformers import AlbertForTokenClassification, TokenClassificationPipeline, PreTrainedTokenizerFast

# Initialize tokenizer
tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NER")

# Initialize model
model = AlbertForTokenClassification.from_pretrained("neuropark/sahajBERT-NER")

# Initialize pipeline
pipeline = TokenClassificationPipeline(tokenizer=tokenizer, model=model)

raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me
output = pipeline(raw_text)
```

#### Limitations and bias

<!-- Provide examples of latent issues and potential remediations. -->
WIP

## Training data

The model was initialized with pre-trained weights of [sahajBERT](https://huggingface.co/neuropark/sahajBERT-NER) at step 19519 and trained on the bengali split of [WikiANN ](https://huggingface.co/datasets/wikiann)

## Training procedure

Coming soon! 
<!-- ```bibtex
@inproceedings{...,
  year={2020}
}
``` -->

## Eval results


loss: 0.11714419722557068

accuracy: 0.9772286821705426

precision: 0.9585365853658536

recall: 0.9651277013752456

f1 : 0.9618208516886931


### BibTeX entry and citation info

Coming soon! 
<!-- ```bibtex
@inproceedings{...,
  year={2020}
}
``` -->