--- language: bn tags: - collaborative - bengali - SequenceClassification license: apache-2.0 datasets: IndicGlue metrics: - Loss - Accuracy - Precision - Recall widget: - text: "এশিয়ায় প্রথম দৃষ্টিহীন ব্যক্তির মাউন্ট এভারেস্ট জয়|" --- # sahajBERT News Article Classification ## Model description [sahajBERT](https://huggingface.co/neuropark/sahajBERT) fine-tuned for news article classification using the `sna.bn` split of [IndicGlue](https://huggingface.co/datasets/indic_glue). The model is trained for classifying articles into 5 different classes: | Label id | Label | |:--------:|:----:| |0 | kolkata| |1 | state| |2 | national| |3 | sports| |4 | entertainment| |5 | international| ## Intended uses & limitations #### How to use You can use this model directly with a pipeline for Sequence Classification: ```python from transformers import AlbertForSequenceClassification, TextClassificationPipeline, PreTrainedTokenizerFast # Initialize tokenizer tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NCC") # Initialize model model = AlbertForSequenceClassification.from_pretrained("neuropark/sahajBERT-NCC") # Initialize pipeline pipeline = TextClassificationPipeline(tokenizer=tokenizer, model=model) raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me output = pipeline(raw_text) ``` #### Limitations and bias WIP ## Training data The model was initialized with pre-trained weights of [sahajBERT](https://huggingface.co/neuropark/sahajBERT) at step 19519 and trained on the `sna.bn` split of [IndicGlue](https://huggingface.co/datasets/indic_glue). ## Training procedure Coming soon! ## Eval results Loss: 0.2477145493030548 Accuracy: 0.926293408929837 Macro F1: 0.9079785326650756 Recall: 0.926293408929837 Weighted F1: 0.9266428029354202 Macro Precision: 0.9109938492260489 Micro Precision: 0.926293408929837 Weighted Precision: 0.9288535478995414 Macro Recall: 0.9069095007692186 Micro Recall: 0.926293408929837 Weighted Recall: 0.926293408929837 ### BibTeX entry and citation info Coming soon!