root commited on
Commit
d9da0c1
1 Parent(s): b4cee1f

init readme

Browse files
Files changed (1) hide show
  1. README.md +85 -0
README.md ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ language: bn
4
+ tags:
5
+ - collaborative
6
+ - bengali
7
+ - NER
8
+ license: apache-2.0
9
+ datasets: xtreme
10
+ metrics:
11
+ - Loss
12
+ - Accuracy
13
+ - Precision
14
+ - Recall
15
+ ---
16
+
17
+ # sahajBERT Named Entity Recognition
18
+
19
+ ## Model description
20
+
21
+ [sahajBERT](https://huggingface.co/neuropark/sahajBERT-NER) fine-tuned for NER using the bengali of [WikiANN ](https://huggingface.co/datasets/wikiann).
22
+
23
+ Named Entities predicted by the model:
24
+
25
+ | Label id | Label |
26
+ |:--------:|:----:|
27
+ |0 |O|
28
+ |1 |B-PER|
29
+ |2 |I-PER|
30
+ |3 |B-ORG|
31
+ |4 |I-ORG|
32
+ |5 |B-LOC|
33
+ |6 |I-LOC|
34
+
35
+ ## Intended uses & limitations
36
+
37
+ #### How to use
38
+
39
+ You can use this model directly with a pipeline for masked language modeling:
40
+ ```python
41
+ from transformers import AlbertForTokenClassification, TokenClassificationPipeline, PreTrainedTokenizerFast
42
+
43
+ # Initialize tokenizer
44
+ tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NER")
45
+
46
+ # Initialize model
47
+ model = AlbertForTokenClassification.from_pretrained("neuropark/sahajBERT-NER")
48
+
49
+ # Initialize pipeline
50
+ pipeline = TokenClassificationPipeline(tokenizer=tokenizer, model=model)
51
+
52
+ raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me
53
+ output = pipeline(raw_text)
54
+ ```
55
+
56
+ #### Limitations and bias
57
+
58
+ <!-- Provide examples of latent issues and potential remediations. -->
59
+ WIP
60
+
61
+ ## Training data
62
+
63
+ The model was initialized it with pre-trained weights of [sahajBERT](https://huggingface.co/neuropark/sahajBERT-NER) at step TODO_REPLACE_BY_STEP_NAME and trained on the bengali of [WikiANN ](https://huggingface.co/datasets/wikiann)
64
+
65
+ ## Training procedure
66
+
67
+ Coming soon!
68
+ <!-- ```bibtex
69
+ @inproceedings{...,
70
+ year={2020}
71
+ }
72
+ ``` -->
73
+
74
+ ## Eval results
75
+
76
+ TODO_REPLACE_BY_METRICS
77
+
78
+ ### BibTeX entry and citation info
79
+
80
+ Coming soon!
81
+ <!-- ```bibtex
82
+ @inproceedings{...,
83
+ year={2020}
84
+ }
85
+ ``` -->