--- language: bn tags: - collaborative - bengali - NER license: apache-2.0 datasets: xtreme metrics: - Loss - Accuracy - Precision - Recall --- # sahajBERT Named Entity Recognition ## Model description [sahajBERT](https://huggingface.co/neuropark/sahajBERT-NER) fine-tuned for NER using the bengali of [WikiANN ](https://huggingface.co/datasets/wikiann). Named Entities predicted by the model: | Label id | Label | |:--------:|:----:| |0 |O| |1 |B-PER| |2 |I-PER| |3 |B-ORG| |4 |I-ORG| |5 |B-LOC| |6 |I-LOC| ## Intended uses & limitations #### How to use You can use this model directly with a pipeline for masked language modeling: ```python from transformers import AlbertForTokenClassification, TokenClassificationPipeline, PreTrainedTokenizerFast # Initialize tokenizer tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NER") # Initialize model model = AlbertForTokenClassification.from_pretrained("neuropark/sahajBERT-NER") # Initialize pipeline pipeline = TokenClassificationPipeline(tokenizer=tokenizer, model=model) raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me output = pipeline(raw_text) ``` #### Limitations and bias WIP ## Training data The model was initialized it with pre-trained weights of [sahajBERT](https://huggingface.co/neuropark/sahajBERT-NER) at step TODO_REPLACE_BY_STEP_NAME and trained on the bengali of [WikiANN ](https://huggingface.co/datasets/wikiann) ## Training procedure Coming soon! ## Eval results accuracy: 0.9756540697674418 f1: 0.9570102589154861 loss: 0.13705264031887054 precision: 0.9518950437317785 recall: 0.962180746561886 ### BibTeX entry and citation info Coming soon!