--- language: bn tags: - collaborative - bengali - SequenceClassification license: apache-2.0 datasets: IndicGlue metrics: - Loss - Accuracy - Precision - Recall --- # sahajBERT News Article Classification ## Model description [sahajBERT](https://huggingface.co/neuropark/sahajBERT) fine-tuned for news article classification using the `sna.bn` split of [IndicGlue](https://huggingface.co/datasets/indic_glue). The model is trained for classifying articles into 5 different classes: | Label id | Label | |:--------:|:----:| |0 | kolkata| |1 | state| |2 | national| |3 | sports| |4 | entertainment| |5 | international| ## Intended uses & limitations #### How to use You can use this model directly with a pipeline for Sequence Classification: ```python from transformers import AlbertForSequenceClassification, TextClassificationPipeline, PreTrainedTokenizerFast # Initialize tokenizer tokenizer = PreTrainedTokenizerFast.from_pretrained("neuropark/sahajBERT-NCC") # Initialize model model = AlbertForSequenceClassification.from_pretrained("neuropark/sahajBERT-NCC") # Initialize pipeline pipeline = TextClassificationPipeline(tokenizer=tokenizer, model=model) raw_text = "এই ইউনিয়নে ৩ টি মৌজা ও ১০ টি গ্রাম আছে ।" # Change me output = pipeline(raw_text) ``` #### Limitations and bias WIP ## Training data The model was initialized with pre-trained weights of [sahajBERT](https://huggingface.co/neuropark/sahajBERT) at step 19519 and trained on the `sna.bn` split of [IndicGlue](https://huggingface.co/datasets/indic_glue). ## Training procedure Coming soon! ## Eval results accuracy: 0.9163713678242381 loss: 0.29771897196769714 macro_f1: 0.8951960933373831 macro_precision: 0.8958313840463195 macro_recall: 0.8962088356299692 micro_f1: 0.9163713678242381 micro_precision: 0.9163713678242381 micro_recall: 0.9163713678242381 weighted_f1: 0.916670480049282 weighted_precision: 0.9180146709071523 weighted_recall: 0.9163713678242381 ### BibTeX entry and citation info Coming soon!